🔄 Pseudo2Code – Transformer-based Pseudocode to C++ Converter

A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.

🖼️ Demo

Try it live on Hugging Face Spaces:
👉 https://huggingface.co/spaces/asadsandhu/Pseudo2Code

🧠 Model Architecture

Developed using the Transformer architecture from scratch in PyTorch
No pre-trained models (pure from-scratch implementation)
Token-level sequence generation using greedy decoding
Custom vocabulary construction for both pseudocode and C++ output


Input:   Pseudocode lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  C++ code line for each pseudocode line

📊 Dataset

We used the SPoC dataset from Stanford:

✅ Clean pseudocode–C++ line pairs
✅ Token-level annotations for syntax handling
✅ Multiple test splits (generalization to problems/workers)
✅ Custom preprocessing and vocabulary building implemented

📎 Licensed under CC BY 4.0

📁 Directory Structure


.
├── app.py                # Gradio web app for inference
├── train.py              # Transformer training code
├── model.pth             # Trained model weights
├── spoc/                 # Dataset directory
│   └── train/
│       ├── spoc-train.tsv
│       └── split/spoc-train-eval.tsv
├── assets/
│   └── demo.png          # App screenshot
└── README.md             # You're here

🛠️ How to Run Locally

⚙️ 1. Clone Repo & Install Requirements

git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt

Or manually install:

pip install torch gradio tqdm

🚀 2. Launch the App

Make sure model.pth is present (or train using train.py):

python app.py

The app will open in your browser.

🧪 Training the Model

You can retrain the model using the train.py script:

python train.py

By default, it downloads data from the public repo and trains for 10 epochs. Outputs a model.pth file with learned weights and vocab.

🔧 Key Hyperparameters

Parameter	Value
Model Type	Transformer
Max Length	128
Embedding Dim	256
FFN Dim	512
Heads	4
Encoder Layers	2
Decoder Layers	2
Batch Size	64
Epochs	10
Optimizer	Adam
Learning Rate	1e-4

🧩 Example Input

n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o

⏩ Output C++

int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}

📦 Deployment

This app is deployed live on:

Hugging Face Spaces: Pseudo2Code
GitHub: github.com/asadsandhu/Pseudo2Code

🙌 Acknowledgements

📘 SPoC Dataset by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). SPoC: Search-based Pseudocode to Code
🧠 Transformer Paper: "Attention is All You Need"

🧑‍💻 Author

Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali

📄 License

This project is licensed under the MIT License. Feel free to use, modify, and share with credit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔄 Pseudo2Code – Transformer-based Pseudocode to C++ Converter

🖼️ Demo

🧠 Model Architecture

📊 Dataset

📁 Directory Structure

🛠️ How to Run Locally

⚙️ 1. Clone Repo & Install Requirements

🚀 2. Launch the App

🧪 Training the Model

🔧 Key Hyperparameters

🧩 Example Input

⏩ Output C++

📦 Deployment

🙌 Acknowledgements

🧑‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
spoc		spoc
README.md		README.md
app.py		app.py
model.pth		model.pth
requirements.txt		requirements.txt
train.ipynb		train.ipynb

asadsandhu/Pseudo2Code

Folders and files

Latest commit

History

Repository files navigation

🔄 Pseudo2Code – Transformer-based Pseudocode to C++ Converter

🖼️ Demo

🧠 Model Architecture

📊 Dataset

📁 Directory Structure

🛠️ How to Run Locally

⚙️ 1. Clone Repo & Install Requirements

🚀 2. Launch the App

🧪 Training the Model

🔧 Key Hyperparameters

🧩 Example Input

⏩ Output C++

📦 Deployment

🙌 Acknowledgements

🧑‍💻 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages