Update README.md

zzmtsvv · Sep 15, 2023 · 816f760 · 816f760
1 parent 15ac23b
commit 816f760
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -17,6 +17,7 @@
 - CNF[18]: [Logs](https://wandb.ai/zzmtsvv/CNF/workspace?workspace=user-zzmtsvv) & [Implementation](https://github.com/zzmtsvv/cnf)
 - offline O3F[22]: [Logs](https://wandb.ai/zzmtsvv/offline_O3F?workspace=user-zzmtsvv) (realised for offline learning, not as stated in the paper)
 - XQL[23]: [Logs](https://wandb.ai/zzmtsvv/XQL?workspace=user-zzmtsvv)
+- TD7[24]: [Logs](https://wandb.ai/zzmtsvv/TD7?workspace=user-zzmtsvv)
 
 At the moment offline training is realised for this models. Logs (of only training actually, unfortunately, without evaluation as it was forbidden on the machine to install mujoco stuff, so I trained the models with preloaded pickle and json datasets) are available up below.
 
@@ -69,4 +70,5 @@ As can be seen from plots and concrete examples on videos, `spot` performs much
 [20] Li, Zhan et al. (2023). [When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning](https://arxiv.org/abs/2205.11027).<br/>
 [21] Kumar, Fu et al. (2019). [Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction](https://proceedings.neurips.cc/paper_files/paper/2019/file/c2073ffa77b5357a498057413bb09d3a-Paper.pdf).<br/>
 [22] Mark, Ghardizadeh et al. (2023). [Fine-Tuning Offline Policies With Optimistic Action Selection](https://openreview.net/forum?id=2x8EKbGU51k). <br/>
-[23] Garg, Hejna et al. (2023). [Extreme Q-Learning: MaxEnt RL without Entropy](https://arxiv.org/abs/2301.02328)
+[23] Garg, Hejna et al. (2023). [Extreme Q-Learning: MaxEnt RL without Entropy](https://arxiv.org/abs/2301.02328) <br/>
+[24] Fujimoto, Chang et al. (2023). [For SALE: State-Action Representation Learning for Deep Reinforcement Learning](https://arxiv.org/abs/2306.02451)