A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.
Try it live on Hugging Face Spaces:
👉 https://huggingface.co/spaces/asadsandhu/Pseudo2Code
- Developed using the Transformer architecture from scratch in PyTorch
- No pre-trained models (pure from-scratch implementation)
- Token-level sequence generation using greedy decoding
- Custom vocabulary construction for both pseudocode and C++ output
Input: Pseudocode lines (line-by-line)
Model: Transformer (Encoder-Decoder)
Output: C++ code line for each pseudocode line
We used the SPoC dataset from Stanford:
- ✅ Clean pseudocode–C++ line pairs
- ✅ Token-level annotations for syntax handling
- ✅ Multiple test splits (generalization to problems/workers)
- ✅ Custom preprocessing and vocabulary building implemented
📎 Licensed under CC BY 4.0
.
├── app.py # Gradio web app for inference
├── train.py # Transformer training code
├── model.pth # Trained model weights
├── spoc/ # Dataset directory
│ └── train/
│ ├── spoc-train.tsv
│ └── split/spoc-train-eval.tsv
├── assets/
│ └── demo.png # App screenshot
└── README.md # You're here
git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txtOr manually install:
pip install torch gradio tqdmMake sure model.pth is present (or train using train.py):
python app.pyThe app will open in your browser.
You can retrain the model using the train.py script:
python train.pyBy default, it downloads data from the public repo and trains for 10 epochs.
Outputs a model.pth file with learned weights and vocab.
| Parameter | Value |
|---|---|
| Model Type | Transformer |
| Max Length | 128 |
| Embedding Dim | 256 |
| FFN Dim | 512 |
| Heads | 4 |
| Encoder Layers | 2 |
| Decoder Layers | 2 |
| Batch Size | 64 |
| Epochs | 10 |
| Optimizer | Adam |
| Learning Rate | 1e-4 |
n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}This app is deployed live on:
- Hugging Face Spaces: Pseudo2Code
- GitHub: github.com/asadsandhu/Pseudo2Code
-
📘 SPoC Dataset by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). SPoC: Search-based Pseudocode to Code
-
🧠 Transformer Paper: "Attention is All You Need"
Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali
This project is licensed under the MIT License. Feel free to use, modify, and share with credit.
