Skip to content

🚀 A fully custom Transformer built from scratch in PyTorch that converts human-written pseudocode into real C++ code. Trained on Stanford’s SPoC dataset and deployed on Hugging Face Spaces with an interactive Gradio app for instant code generation.

Notifications You must be signed in to change notification settings

asadsandhu/Pseudo2Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔄 Pseudo2Code – Transformer-based Pseudocode to C++ Converter

License: MIT Python Hugging Face GitHub Repo

A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.


🖼️ Demo

Try it live on Hugging Face Spaces:
👉 https://huggingface.co/spaces/asadsandhu/Pseudo2Code

App Demo


🧠 Model Architecture

  • Developed using the Transformer architecture from scratch in PyTorch
  • No pre-trained models (pure from-scratch implementation)
  • Token-level sequence generation using greedy decoding
  • Custom vocabulary construction for both pseudocode and C++ output

Input:   Pseudocode lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  C++ code line for each pseudocode line


📊 Dataset

We used the SPoC dataset from Stanford:

  • ✅ Clean pseudocode–C++ line pairs
  • ✅ Token-level annotations for syntax handling
  • ✅ Multiple test splits (generalization to problems/workers)
  • ✅ Custom preprocessing and vocabulary building implemented

📎 Licensed under CC BY 4.0


📁 Directory Structure


.
├── app.py                # Gradio web app for inference
├── train.py              # Transformer training code
├── model.pth             # Trained model weights
├── spoc/                 # Dataset directory
│   └── train/
│       ├── spoc-train.tsv
│       └── split/spoc-train-eval.tsv
├── assets/
│   └── demo.png          # App screenshot
└── README.md             # You're here


🛠️ How to Run Locally

⚙️ 1. Clone Repo & Install Requirements

git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt

Or manually install:

pip install torch gradio tqdm

🚀 2. Launch the App

Make sure model.pth is present (or train using train.py):

python app.py

The app will open in your browser.


🧪 Training the Model

You can retrain the model using the train.py script:

python train.py

By default, it downloads data from the public repo and trains for 10 epochs. Outputs a model.pth file with learned weights and vocab.


🔧 Key Hyperparameters

Parameter Value
Model Type Transformer
Max Length 128
Embedding Dim 256
FFN Dim 512
Heads 4
Encoder Layers 2
Decoder Layers 2
Batch Size 64
Epochs 10
Optimizer Adam
Learning Rate 1e-4

🧩 Example Input

n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o

⏩ Output C++

int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}

📦 Deployment

This app is deployed live on:


🙌 Acknowledgements


🧑‍💻 Author

Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali


📄 License

This project is licensed under the MIT License. Feel free to use, modify, and share with credit.

About

🚀 A fully custom Transformer built from scratch in PyTorch that converts human-written pseudocode into real C++ code. Trained on Stanford’s SPoC dataset and deployed on Hugging Face Spaces with an interactive Gradio app for instant code generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published