Skip to content

lucidrains/ppo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1k steps

PPO

An implementation of PPO with recent random improvements

The phasic part has been removed, repository to be renamed. I do not think it does anything

Install

$ pip install -r requirements.txt

You may need to install swig

$ apt install swig

Use

$ python train.py

Citations

@article{Schulman2017ProximalPO,
    title   = {Proximal Policy Optimization Algorithms},
    author  = {John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov},
    journal = {ArXiv},
    year    = {2017},
    volume  = {abs/1707.06347},
    url     = {https://api.semanticscholar.org/CorpusID:28695052}
}
@article{Zhang2024ReLU2WD,
    title   = {ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs},
    author  = {Zhengyan Zhang and Yixin Song and Guanghui Yu and Xu Han and Yankai Lin and Chaojun Xiao and Chenyang Song and Zhiyuan Liu and Zeyu Mi and Maosong Sun},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2402.03804},
    url     = {https://api.semanticscholar.org/CorpusID:267499856}
}
@inproceedings{Lee2024SimBaSB,
    title  = {SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning},
    author = {Hojoon Lee and Dongyoon Hwang and Donghu Kim and Hyunseung Kim and Jun Jet Tai and Kaushik Subramanian and Peter R. Wurman and Jaegul Choo and Peter Stone and Takuma Seno},
    year   = {2024},
    url    = {https://api.semanticscholar.org/CorpusID:273346233}
}
@inproceedings{anonymous2024the,
    title   = {The Complexity Dynamics of Grokking},
    author  = {Anonymous},
    booktitle = {Submitted to The Thirteenth International Conference on Learning Representations},
    year    = {2024},
    url     = {https://openreview.net/forum?id=07N9jCfIE4},
    note    = {under review}
}
@article{Yang2020LearningLD,
    title   = {Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification},
    author  = {Huanrui Yang and Minxue Tang and Wei Wen and Feng Yan and Daniel Hu and Ang Li and Hai Helen Li and Yiran Chen},
    journal = {2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
    year    = {2020},
    pages   = {2899-2908},
    url     = {https://api.semanticscholar.org/CorpusID:213940794}
}
@article{Farebrother2024StopRT,
    title   = {Stop Regressing: Training Value Functions via Classification for Scalable Deep RL},
    author  = {Jesse Farebrother and Jordi Orbay and Quan Ho Vuong and Adrien Ali Taiga and Yevgen Chebotar and Ted Xiao and Alex Irpan and Sergey Levine and Pablo Samuel Castro and Aleksandra Faust and Aviral Kumar and Rishabh Agarwal},
    journal = {ArXiv},
    year   = {2024},
    volume = {abs/2403.03950},
    url    = {https://api.semanticscholar.org/CorpusID:268253088}
}
@article{Lee2024AnalysisClippedCritic
    title   = {On Analysis of Clipped Critic Loss in Proximal Policy Gradient},
    author  = {Yongjin Lee, Moonyoung Chung},
    journal = {Authorea},
    year    = {2024}
}
@inproceedings{Felizardo2025ARL,
    title   = {A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks},
    author  = {Leonardo Kanashiro Felizardo and Edoardo Fadda and Paolo Brandimarte and Emilio Del-Moral-Hernandez and Mari'a Cristina Vasconcelos Nascimento},
    year    = {2025},
    url     = {https://api.semanticscholar.org/CorpusID:277621941}
}

About

An implementation of PPO in Pytorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages