This is the codebase for the paper Guiding Pretraining in Reinforcement Learning with Large Language Models (ICML 2023).
For Crafter, run
conda env create -f conda_env.yml
conda activate text-crafter
For Housekeep, follow this set of adapted installation instructions:
conda create -n ellm-housekeep python=3.8
conda activate ellm-housekeep
conda install habitat-sim==0.2.1 withbullet headless -c conda-forge -c aihabitat
cd text_housekeep/habitat_lab
pip install -r requirements.txt
python develop --all
Then run conda env update -f conda_env_housekeep.yml
to install additional ELLM dependencies. If you get an error about llvm-lite
, use pip install llvmlite==0.38.0 --ignore-installed
Then download the data following the instructions here. Everything should be downloaded under text_housekeep/data
If you get an error
File "/YOUR_MINICONDA_DIR/envs/ellm-housekeep/lib/python3.8/site-packages/habitat_sim/agent/controls/", line 47, in _rotate_local
raise RuntimeError(
RuntimeError: Constrained look only works for a singular look action type
Go to /YOUR_MINICONDA_DIR/envs/ellm-housekeep/lib/python3.8/site-packages/habitat_sim/agent/controls/
and change line 47 if mn.math.angle(ref_vector, mn.Vector3(np.abs(rotation.axis().normalized()))) > mn.Rad(1e-3):
to if mn.math.angle(ref_vector, mn.Vector3(np.abs(rotation.axis().normalized()))) > mn.Rad(1e-3):
If you haven't already, sign up for an OpenAI account.
Add this to your .bashrc:
To train the agent:
All default configurations are found in config.yaml
. For environment-specific hyperparameter changes, see Appendix G in the paper. Some important ones are:
To use text observations (in addition to pixel observations, which is the default):
- set to True to use SBERT encodings for text observations.
- set sbert
to train with text observations as well as pixel observations (pixels only is the default).
To use goal conditioning:
- set to True to condition the agent policy on a language string of the goal.
- set sbert
to encode goals using a SBERT model.
To run the oracle:
To run with GPT:
--env_spec.lm_spec.lm_class=GPTLanguageModel --env_spec.threshold=.99 env_spec.lm_spec.temperature=0.1 env_spec.lm_spec.openai_key=YOUR_API_KEY
To run with intrinsic objective baselines:
for APT
for RND
Note: to add in additonal arguments for the baseline agents, you can add arguments +agent.use_language=True
or +agent.noveld=True
Monitor results:
You can set --use_tb=True
to log to tensorboard. Then run TB:
sh tensorboard --logdir exp_local
Alternatively, set --use_wandb=True
to log to wandb.
Housekeep Episodes:
You can set --env_spec.housekeep_ep_num
to choose between episodes with different object and receptable configurations. The paper reports results for 2, 8, 12, and 14.
(These commands match the configurations used in our paper).
python env_spec.use_sbert=True use_goal=True use_language_state=sbert env_spec.lm_spec.lm_class=SimpleOracle exp_name=TEST_RUN
python env_spec.use_sbert=True use_goal=True use_language_state=sbert env_spec.lm_spec.lm_class=GPTLanguageModel env_spec.lm_spec.openai_key=YOUR_KEY exp_name=TEST_RUN
python single_goal_eval=True use_language_state=sbert env_spec.use_sbert=True use_goal=True env_spec.lm_spec.lm_class=SimpleOracle train_after_reward=True decay_after_reward=True env_spec.single_task="finetuning task name" expl_agent_path=/path/to/pretrained/agent num_train_frames=2000000 exp_name=TEST_RUN
(Example finetuning task: "attack cow")
(Variant: for finetuning tasks where the agent should see subgoals (but still only get rewarded for full completion),
set env_spec.single_goal_hierarchical=reward_last use_goal=True
python single_goal_eval=True use_language_state=sbert env_spec.use_sbert=True use_goal=True env_spec.lm_spec.lm_class=SimpleOracle train_after_reward=True decay_after_reward=True finetune_snapshot=True agent.train_eps_max=.5 lr=0.00002 env_spec.single_task="finetuning task name" snapshot_path=/path/to/pretrained/agent num_train_frames=1000000 exp_name=TEST_RUN
(Example finetuning task: "attack cow")
python env_spec.use_sbert=True use_language_state=sbert env_spec.use_language_state=True env_spec.lm_spec.lm_class=GPTLanguageModel env_spec.lm_spec.lm=text-davinci-002 env_spec.lm_spec.budget=10 env_spec.lm_spec.temperature=0 env_spec.lm_spec.prob_threshold=.5 use_wandb=true num_train_frames=4000000 replay_buffer_num_workers=5 batch_size=256 agent.train_eps_min=0.1 agent.train_eps_decay_steps=5000000 exp_name=TEST_RUN env_spec.housekeep_ep_num=2
python env_spec.use_sbert=True use_language_state=sbert env_spec.use_language_state=True env_spec.lm_spec.lm_class=SimpleOracle finetune_snapshot=True snapshot_path=ADD_PATH_TO_PRETRAINED_MODEL_HERE use_wandb=true num_train_frames=5000000 replay_buffer_num_workers=5 batch_size=256 agent.train_eps_min=0.1 agent.train_eps_decay_steps=5000000 exp_name=FINETUNE_RUN env_spec.housekeep_ep_num=2
python env_spec.use_sbert=True use_language_state=sbert env_spec.use_language_state=True env_spec.lm_spec.lm_class=SimpleOracle finetune_snapshot=True use_wandb=true num_train_frames=5000000 replay_buffer_num_workers=5 batch_size=256 agent.train_eps_min=0.2 agent.train_eps_decay_steps=1000000 exp_name=EXPL_RUN env_spec.housekeep_ep_num=2 expl_agent_path=ADD_PATH_TO_PRETRAINED_MODEL_HERE
Download these to the project directory if you want to run with learned captioners.
(Not recommended for most experiments, since they slow things down significantly and degrade performance).