GPT-2 From Scratch

Model Architecture

Model Config:

Embedding Dimensions: 768
Vocabulary Size: 50,257
Sequence Length: 1,024
Attention Heads: 8
Decoder Blocks: 12
Dropout: 0.1

Architecture Overview

The GPT-2 model is based on the transformer architecture, specifically designed for natural language processing tasks. Key components include:

Positional Encoding: Helps the model understand the order of words in a sequence.
Multi-Head Attention: Allows the model to focus on different parts of the input simultaneously.
Feed-Forward Networks: Applies non-linear transformations to the input data.
Layer Normalization: Stabilizes and accelerates the training process.

Implementation Details

This project implements the GPT-2 model from scratch, providing a deep understanding of its inner workings. The implementation closely follows the original architecture while offering customization options.

Resources

Andreij Karpathy Lectures - https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&pp=iAQB
Sebastian Raschka - https://github.com/rasbt/LLMs-from-scratch
Original GPT-2 Paper - https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Umar Jamil YouTube - https://www.youtube.com/watch?v=ISNdQcPhsts&t=4760s&pp=ygUKdW1hciBqYW1pbA%3D%3D

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
inference.py		inference.py
model.py		model.py
train.py		train.py
wiki.train.txt		wiki.train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 From Scratch

Model Architecture

Model Config:

Architecture Overview

Implementation Details

Resources

About

Releases

Packages

Languages

sidmanale643/GPT-2-From-Scratch

Folders and files

Latest commit

History

Repository files navigation

GPT-2 From Scratch

Model Architecture

Model Config:

Architecture Overview

Implementation Details

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages