This repo consists of modifications to the Spinningup implementation of the Soft Actor-Critic algorithm to allow for both image observations and discrete action spaces.
Trained Atari agents (courtesy of https://github.com/yining043):
tensorflow 1.15.0
gym[atari] 0.15.7
cv2
mpi4py
numpy
matplotlib
-
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018 https://arxiv.org/abs/1801.01290
-
Soft Actor-Critic Algorithms and Applications, Haarnoja et al, 2019, https://arxiv.org/abs/1812.05905
-
Soft Actor Critic for Discrete Action Settings, Petros Christodoulou, 2019, https://arxiv.org/abs/1910.07207 (authors implementation here: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)
https://spinningup.openai.com/en/latest/algorithms/sac.html
Two different methods given for using SAC with discrete action spaces.
-
sac_discrete_gb uses the Gumbel Softmax distribtuion to reparameterize the discrete action space. This keeps algorithm similar to the original SAC implementation for continuous action spaces.
-
sac_discrete avoids reparmeterisation and calculate the entropy and KL divergence from the discrete actions given by the policy network. This is based on the method described in [3] and is most accurate to the original SAC papers, I also find best results with this method.
Versions of the algorithms that work with image observations such as the atari gym environments are in the image observation directory.