Reinforcement Learning examples created using the Python exrension of the open-source Chrono Project physics engine (Pychrono) together with Tensorflow.
I implemented 2 environments for robotic control (so both observations and action are continuous).
ChronoPendulum Classical reverse pendulum, has to balance a pole on a cart. 1 action (force along the z axis) and 4 observations.
ChronoAnt a 4-legged walker that has to walk straight. 8 actions (joint torques) and 30 observations.
To train the Neural Networks to control solve the tasks I used a reinforcement learning algorithm known as Proximal Policy Optimization (PPO). PPO is an on-policy actor-critic algorithm, thus you will find 2 NNs: the first one given the state prescribes an action (Policy), the second given the state evaluates the value function (VF).
PyChrono
Tensorflow
NumPy
SciPy
Of course Tensorflow is not needed if the environments are used to test learning model created with other libraries.
If you have Anaconda you can simply install Pychrono in any of your Python3 environments from my personal Anaconda repository by typing:
For Linux:
conda install -c simonebenatti pychrono
For Windows:
conda install -c simonebenatti/label/develop pychrono
You will need:
Any C++ compiler
Irrlich libraries
Git (or a Git client)
CMake
SWIG (for the Python wrapper)
Install Chrono following these instructions. Don't forget to:
- Allow recursive cloning of submodules
- Enable the Python Module from CMake options
- Follow the additional instructions to build the Python module
IMPORTANT: before changing the environment, delete old saved model files (not the folders), otherwise it will fail trying to restore a model with different sizes.
Make sure that you are using the right Python interpreter. Than simply run the script with the needed keyboard arguments EXAMPLES: For the inverted pendulum:
python ./train_serial.py ChronoPendulum -n 1000
For the 4-legged ant:
python ./train_serial.py ChronoAnt -n 20000
- Environment name : 'env_name'
- Number of episodes: '-n', '--num_episodes', default=1000
- --renderON / --renderOFF
- Discount factor: '-g', '--gamma', default=0.995
- Lambda for GAE: -l, --lam, default=0.98
- Kullback Leibler divergence target value: -k, --kl_targ, default=0.003
- Batch size: -b, --batch_size, default=20
NN parameters and the other TF variables are stored inside the Policy and VF directories, while the scaler means and variances are stored in the scaler.dat saved numpy array. These files and folders can be stored and used to restore a previous checkpoint.
Parallel training collects datas from 6 simulations simultaneously, speeding up the process. So far this is available only for the Ant environment and is tested only for Tensorflow-GPU.
- Add clipped objective function
- Add new environemnts