This repository demonstrates the evolution of conversational AI systems before Transformers, built step-by-step to expose why modern LLMs exist.
Instead of starting with Hugging Face models, this project reconstructs the failures and breakthroughs that led to Transformers — from deterministic rules to neural attention.
Focus: Concepts, architecture, and learning — not production polish.
- How rule-based chatbots work — and why they fail
- How Seq2Seq (LSTM Encoder–Decoder) improved things — and why it still failed
- How Attention solved the core bottleneck
- Why Transformers were inevitable
This is a foundational NLP project, not a demo chatbot.
project-chatbot/
│
├── src/
│ ├── rule_based/
│ │ ├── intents.json
│ │ ├── chatbot.py
│ │ └── serve.py
│ │
│ └── seq2seq/
│ ├── data/
│ │ └── conversations.txt
│ ├── dataset.py
│ ├── model.py
│ ├── train.py
│ └── chat.py
│
├── requirements.txt
└── README.md
- Intent-based chatbot using pattern matching
- Predefined responses
- Fallback handling
- Simple session memory
-
How early chatbots worked in production
-
Why rule systems are:
- brittle
- hard to scale
- expensive to maintain
python -m src.rule_based.serve- LSTM Encoder–Decoder architecture
- Teacher forcing during training
- Token handling (
<sos>,<eos>,<unk>) - Step-by-step decoding
- The context bottleneck problem
- Why compressing a sentence into one vector fails
- Why early neural chatbots produced vague or repetitive responses
python -m src.seq2seq.trainpython -m src.seq2seq.chat- Encoder returns all hidden states
- Decoder uses Bahdanau (additive) attention
- Dynamic context vectors per decoding step
- Why attention was a breakthrough
- How word-level alignment improves generation
- Why attention is the core idea behind Transformers
Output quality is intentionally limited due to small data and RNN constraints — this is a learning project, not a production system.
- Repetitive responses
- Weak generalization
- Small dataset
- No beam search or decoding tricks
These are not bugs — they are the historical reasons Transformers replaced RNN-based models.
Phase 1 — Deterministic NLP
- Intent classification
- Rule-based dialogue flow
- Failure modes of handcrafted systems
Phase 2 — Neural Dialogue
- Encoder–Decoder intuition
- Teacher forcing
- Exposure bias
- Context bottleneck
Phase 3 — Attention
- Relevance scoring (“energy”)
- Dynamic context vectors
- Decoder alignment with encoder outputs
Outcome
-
Clear understanding of:
- why Seq2Seq failed
- why attention fixed it
- why Transformers exist
torch>=2.0.0
numpy>=1.23.0
Install:
pip install -r requirements.txtMost people use Transformers. Very few understand why they were needed.
This project ensures that gap is closed.
Tanish Sarkar Pre-Transformer NLP Projects