Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zzmtsvv authored Sep 14, 2023
1 parent 5eebcc6 commit ab70e21
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
`April 2023`: This repository contains experiments of different reinforcement learning algorithms applied to 3 MuJoCo environments - `Walker2d, Hopper and Halfcheetah`. Essentially, there are 2 models in comparison: Adaptive Behavior Cloning Regularization [1] (in short, `redq_bc`) and Supported Policy Optimization for Offline Reinforcement Learning [2] (in short, `spot`).<br /><br />
`July-September 2023 update`: There are also additional implementations of:

- Cal-QL [9] in `cal_ql`: [Logs](https://wandb.ai/zzmtsvv/cal_ql?workspace=user-zzmtsvv)
- ReBRAC[11] in `rebrac`: [Logs](https://wandb.ai/zzmtsvv/ReBRAC?workspace=user-zzmtsvv)
- EDAC[12] in `edac`: Logs: [EDAC itself](https://wandb.ai/zzmtsvv/EDAC?workspace=user-zzmtsvv), [SAC-N[12]](https://wandb.ai/zzmtsvv/SAC-N?workspace=user-zzmtsvv) (with `eta = 0`), [LB-SAC[16]](https://wandb.ai/zzmtsvv/LB-SAC?workspace=user-zzmtsvv) (with `eta = 0` and `batch_size = 10_000`)
- AWAC[13] in `awac`: [Logs](https://wandb.ai/zzmtsvv/AWAC?workspace=user-zzmtsvv)
- Decision Transformer[14] in `decision_transformer`: [Logs](https://wandb.ai/zzmtsvv/DecisionTransformer?workspace=user-zzmtsvv)
- IQL[15] in `iql`: [Logs](https://wandb.ai/zzmtsvv/IQL?workspace=user-zzmtsvv)
- MSG[17] in `msg`: [Logs](https://wandb.ai/zzmtsvv/MSG?workspace=user-zzmtsvv) (This method is realised upon offline SAC-N algorithm. However, my realization lacks appropriate hyperparameters for best results.)
- PRDC[19] in `prdc`: [Logs](https://wandb.ai/zzmtsvv/PRDC?workspace=user-zzmtsvv)
- DOGE[20] in `doge`: [Logs](https://wandb.ai/zzmtsvv/DOGE?workspace=user-zzmtsvv)
- BEAR[21] in `bear`: [Logs](https://wandb.ai/zzmtsvv/BEAR?workspace=user-zzmtsvv)
- Cal-QL [9]: [Logs](https://wandb.ai/zzmtsvv/cal_ql?workspace=user-zzmtsvv)
- ReBRAC[11]: [Logs](https://wandb.ai/zzmtsvv/ReBRAC?workspace=user-zzmtsvv)
- EDAC[12]: Logs: [EDAC itself](https://wandb.ai/zzmtsvv/EDAC?workspace=user-zzmtsvv), [SAC-N[12]](https://wandb.ai/zzmtsvv/SAC-N?workspace=user-zzmtsvv) (with `eta = 0`), [LB-SAC[16]](https://wandb.ai/zzmtsvv/LB-SAC?workspace=user-zzmtsvv) (with `eta = 0` and `batch_size = 10_000`)
- AWAC[13]: [Logs](https://wandb.ai/zzmtsvv/AWAC?workspace=user-zzmtsvv)
- Decision Transformer[14]: [Logs](https://wandb.ai/zzmtsvv/DecisionTransformer?workspace=user-zzmtsvv)
- IQL[15]: [Logs](https://wandb.ai/zzmtsvv/IQL?workspace=user-zzmtsvv)
- MSG[17]: [Logs](https://wandb.ai/zzmtsvv/MSG?workspace=user-zzmtsvv) (This method is realised upon offline SAC-N algorithm. However, my realization lacks appropriate hyperparameters for best results.)
- PRDC[19]: [Logs](https://wandb.ai/zzmtsvv/PRDC?workspace=user-zzmtsvv)
- DOGE[20]: [Logs](https://wandb.ai/zzmtsvv/DOGE?workspace=user-zzmtsvv)
- BEAR[21]: [Logs](https://wandb.ai/zzmtsvv/BEAR?workspace=user-zzmtsvv)
- SAC-RND[10]: [Logs](https://wandb.ai/zzmtsvv/sac_rnd?workspace=user-zzmtsvv) & [Implementation](https://github.com/zzmtsvv/sac_rnd)
- RORL: [Logs](https://wandb.ai/zzmtsvv/RORL?workspace=user-zzmtsvv) & [Implementation](https://github.com/zzmtsvv/rorl) (lacks appropriate hyperparameters)
- CNF[18]: [Logs](https://wandb.ai/zzmtsvv/CNF/workspace?workspace=user-zzmtsvv) & [Implementation](https://github.com/zzmtsvv/cnf)
Expand Down

0 comments on commit ab70e21

Please sign in to comment.