ppo

Reproducing Proximal Policy Optimization (PPO) Algorithm Results

This repository contains scripts that enable training agents using the Proximal Policy Optimization (PPO) Algorithm on MuJoCo and Atari environments. We follow the original paper Proximal Policy Optimization Algorithms by Schulman et al. (2017) to implement the PPO algorithm but introduce the improvement of computing the Generalised Advantage Estimator (GAE) at every epoch.

Examples Structure

Please note that each example is independent of each other for the sake of simplicity. Each example contains the following files:

Main Script: The definition of algorithm components and the training loop can be found in the main script (e.g. ppo_atari.py).
Utils File: A utility file is provided to contain various helper functions, generally to create the environment and the models (e.g. utils_atari.py).
Configuration File: This file includes default hyperparameters specified in the original paper. Users can modify these hyperparameters to customize their experiments (e.g. config_atari.yaml).

Running the Examples

You can execute the PPO algorithm on Atari environments by running the following command:

python ppo_atari.py

You can execute the PPO algorithm on MuJoCo environments by running the following command:

python ppo_mujoco.py

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
config_atari.yaml		config_atari.yaml
config_mujoco.yaml		config_mujoco.yaml
ppo_atari.py		ppo_atari.py
ppo_mujoco.py		ppo_mujoco.py
utils_atari.py		utils_atari.py
utils_mujoco.py		utils_mujoco.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo

ppo

README.md

Reproducing Proximal Policy Optimization (PPO) Algorithm Results

Examples Structure

Running the Examples

Files

ppo

Directory actions

More options

Directory actions

More options

Latest commit

History

ppo

Folders and files

parent directory

README.md

Reproducing Proximal Policy Optimization (PPO) Algorithm Results

Examples Structure

Running the Examples