Skip to content

Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details.

License

Notifications You must be signed in to change notification settings

StepanTita/nano-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nano-BERT

transformers-2

Nano-BERT: A Simplified and Understandable Implementation of BERT

Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details, making it an ideal starting point for those aiming to grasp the fundamental ideas behind transformers.

Key Features and Focus 🚀:

  • Simplicity and Understandability: Nano-BERT prioritizes simplicity and clarity, making it accessible for anyone looking to understand the core concepts of transformers.

  • Multi-Headed Self Attention: The implementation of multi-headed self-attention is intentionally less efficient but more descriptive. Each attention head is treated as a separate object, emphasizing transparency over optimization techniques like matrix transposition and efficient multiplication.

  • Educational Purposes: This project is designed for educational purposes, offering a learning platform for individuals interested in transformer architectures.

  • Customizability: Nano-BERT allows extensive customization, enabling users to experiment with various parameters such as the number of layers, heads, and embedding sizes. It serves as a playground for exploring the impact of different configurations on model performance.

  • Inspiration: The project draws inspiration from ongoing research endeavors related to efficient LLM fine-tuning space-model. Additionally, it is influenced by the deep learning series conducted by Andrej Karpathy YouTube, particularly the nanoGPT project.

  • Motivation and Development: Nano-BERT originated from the author's curiosity about embedding custom datasets into a three-dimensional space using BERT. To achieve this, the goal was to construct a fully customizable version of BERT, providing complete control over the model's behavior. The motivation was to comprehend how BERT could handle datasets with words as tokens, diverging from the common sub-word approach.

Community Engagement 💬: While Nano-BERT is not intended for production use, contributions, suggestions, and feedback from the community are highly encouraged. Users are welcome to propose improvements, simplifications, or enhanced descriptions by creating pull requests or issues.

Exploration and Experimentation 🌎: Nano-BERT's flexibility enables users to experiment freely. Parameters like the number of layers, heads, and embedding sizes can be tailored to specific needs. This customizable nature empowers users to explore diverse configurations and assess their impact on model outcomes.

Note: Nano-BERT was developed with a focus on educational exploration and understanding, and it should be utilized within the scope of educational and experimental contexts only!

Installation 🛠️

Prerequisites

  • Python 3.10.x
  • pip*
pip install torch

Note: to be able to run demos you might need some additional packages, but for base model all you needs is pytorch

pip install tqdm scikit-learn matplotlib plotly

Package installation

⚠️: currently only available through GitHub, but pip version is coming soon!

git clone https://github.com/StepanTita/nano-BERT.git

Usage Example ⚙️

from nano_bert.model import NanoBERT
from nano_bert.tokenizer import WordTokenizer

vocab = [...]  # a list of tokens (or words) to use in tokenizer

tokenizer = WordTokenizer(vocab=vocab, max_seq_len=128)

# Usage:
input_ids = tokenizer('This is a sentence')  # or tokenizer(['This', 'is', 'a', 'sentence'])

# Instantiate the NanoBERT model
nano_bert = NanoBERT(input_ids)

# Example usage
embedded_text = nano_bert.embedding(input_ids)
print(embedded_text)

Results 📈:

Benchmarks 🏆:

For all of the following experiments we use the following configuration:

n_layer = 1
n_head = 1
dropout = 0.1
n_embed = 3
max_seq_len = 128
epochs = 200
batch_size = 32
Dataset Accuracy F-1 Score
IMDB Sentiment (2-class) 0.734 0.745
HateXplain Data (2-class) 0.693 0.597

Result plots IMDB:

accuracy-IMDB f1-IMDB

Interpretation ⁉️:

Attentions Visualized:

Attention-IMDB-1 Attention-IMDB-2 Attention-IMDB-3 Attention-IMDB-4

Embeddings Visualized in 3D:

Embeddings-3d-1 Embeddings-3d-2 Embeddings-3d-3 Embeddings-3d-4 Embeddings-3d-5

Note: see demo.ipynb and imdb_demo.ipynb for better examples

License 📄

This project is licensed under the MIT License. See the LICENSE.md file for details.

About

Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published