Skip to content

sidmanale643/GPT-2-From-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT-2 From Scratch

GPT-2 Architecture

Model Architecture

Model Config:

  • Embedding Dimensions: 768
  • Vocabulary Size: 50,257
  • Sequence Length: 1,024
  • Attention Heads: 8
  • Decoder Blocks: 12
  • Dropout: 0.1

Architecture Overview

The GPT-2 model is based on the transformer architecture, specifically designed for natural language processing tasks. Key components include:

  1. Positional Encoding: Helps the model understand the order of words in a sequence.
  2. Multi-Head Attention: Allows the model to focus on different parts of the input simultaneously.
  3. Feed-Forward Networks: Applies non-linear transformations to the input data.
  4. Layer Normalization: Stabilizes and accelerates the training process.

Implementation Details

This project implements the GPT-2 model from scratch, providing a deep understanding of its inner workings. The implementation closely follows the original architecture while offering customization options.

Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages