This is the codebase for the paper Guiding Pretraining in Reinforcement Learning with Large Language Models (ICML 2023).
For Crafter, run
conda env create -f conda_env.yml
conda activate text-crafter
For Housekeep, follow this set of adapted installation instructions:
conda create -n ellm-housekeep python=3.8
conda activate ellm-housekeep
conda install habitat-sim==0.2.1 withbullet headless -c conda-forge -c aihabitat
cd text_housekeep/habitat_lab
pip install -r requirements.txt
python setup.py develop --all
Then run conda env update -f conda_env_housekeep.yml
to install additional ELLM dependencies. If you get an error about llvm-lite
, use pip install llvmlite==0.38.0 --ignore-installed
.
Then download the data following the instructions here. Everything should be downloaded under text_housekeep/data
.
If you get an error
File "/YOUR_MINICONDA_DIR/envs/ellm-housekeep/lib/python3.8/site-packages/habitat_sim/agent/controls/default_controls.py", line 47, in _rotate_local
raise RuntimeError(
RuntimeError: Constrained look only works for a singular look action type
Go to /YOUR_MINICONDA_DIR/envs/ellm-housekeep/lib/python3.8/site-packages/habitat_sim/agent/controls/default_controls.py
and change line 47 if mn.math.angle(ref_vector, mn.Vector3(np.abs(rotation.axis().normalized()))) > mn.Rad(1e-3):
to if mn.math.angle(ref_vector, mn.Vector3(np.abs(rotation.axis().normalized()))) > mn.Rad(1e-3):
If you haven't already, sign up for an OpenAI account.
Add this to your .bashrc:
export OPENAI_API_KEY=GET_YOUR_API_KEY_FROM_THEIR_WEBSITE
To train the agent:
python train.py
All default configurations are found in config.yaml
. For environment-specific hyperparameter changes, see Appendix G in the paper. Some important ones are:
To use text observations (in addition to pixel observations, which is the default):
--env_spec.use_sbert
- set to True to use SBERT encodings for text observations.
--use_language_state=sbert
- set sbert
to train with text observations as well as pixel observations (pixels only is the default).
To use goal conditioning:
--use_goal
- set to True to condition the agent policy on a language string of the goal.
--agent.goal_encoder_type=sbert
- set sbert
to encode goals using a SBERT model.
To run the oracle:
--env_spec.lm_spec.lm_class=SimpleOracle
To run with GPT:
--env_spec.lm_spec.lm_class=GPTLanguageModel --env_spec.threshold=.99 env_spec.lm_spec.temperature=0.1 env_spec.lm_spec.openai_key=YOUR_API_KEY
To run with intrinsic objective baselines:
--agent._target_=baselines.icm_apt.ICMAPTAgent
for APT
--agent._target_=baselines.rnd.RNDAgent
for RND
Note: to add in additonal arguments for the baseline agents, you can add arguments +agent.use_language=True
or +agent.noveld=True
Monitor results:
You can set --use_tb=True
to log to tensorboard. Then run TB:
sh tensorboard --logdir exp_local
Alternatively, set --use_wandb=True
to log to wandb.
Housekeep Episodes:
You can set --env_spec.housekeep_ep_num
to choose between episodes with different object and receptable configurations. The paper reports results for 2, 8, 12, and 14.
(These commands match the configurations used in our paper).
python train.py env_spec.use_sbert=True use_goal=True use_language_state=sbert env_spec.lm_spec.lm_class=SimpleOracle exp_name=TEST_RUN
python train.py env_spec.use_sbert=True use_goal=True use_language_state=sbert env_spec.lm_spec.lm_class=GPTLanguageModel env_spec.lm_spec.openai_key=YOUR_KEY exp_name=TEST_RUN
python train.py single_goal_eval=True use_language_state=sbert env_spec.use_sbert=True use_goal=True env_spec.lm_spec.lm_class=SimpleOracle train_after_reward=True decay_after_reward=True env_spec.single_task="finetuning task name" expl_agent_path=/path/to/pretrained/agent num_train_frames=2000000 exp_name=TEST_RUN
(Example finetuning task: "attack cow")
(Variant: for finetuning tasks where the agent should see subgoals (but still only get rewarded for full completion),
set env_spec.single_goal_hierarchical=reward_last use_goal=True
).
python train.py single_goal_eval=True use_language_state=sbert env_spec.use_sbert=True use_goal=True env_spec.lm_spec.lm_class=SimpleOracle train_after_reward=True decay_after_reward=True finetune_snapshot=True agent.train_eps_max=.5 lr=0.00002 env_spec.single_task="finetuning task name" snapshot_path=/path/to/pretrained/agent num_train_frames=1000000 exp_name=TEST_RUN
(Example finetuning task: "attack cow")
python train.py env_spec.use_sbert=True use_language_state=sbert env_spec.use_language_state=True env_spec.lm_spec.lm_class=GPTLanguageModel env_spec.lm_spec.lm=text-davinci-002 env_spec.lm_spec.budget=10 env_spec.lm_spec.temperature=0 env_spec.lm_spec.prob_threshold=.5 env_spec.name=Housekeep use_wandb=true num_train_frames=4000000 replay_buffer_num_workers=5 batch_size=256 agent.train_eps_min=0.1 agent.train_eps_decay_steps=5000000 exp_name=TEST_RUN env_spec.housekeep_ep_num=2
python train.py env_spec.use_sbert=True use_language_state=sbert env_spec.use_language_state=True env_spec.lm_spec.lm_class=SimpleOracle finetune_snapshot=True snapshot_path=ADD_PATH_TO_PRETRAINED_MODEL_HERE env_spec.name=Housekeep use_wandb=true num_train_frames=5000000 replay_buffer_num_workers=5 batch_size=256 agent.train_eps_min=0.1 agent.train_eps_decay_steps=5000000 exp_name=FINETUNE_RUN env_spec.housekeep_ep_num=2
python train.py env_spec.use_sbert=True use_language_state=sbert env_spec.use_language_state=True env_spec.lm_spec.lm_class=SimpleOracle finetune_snapshot=True env_spec.name=Housekeep use_wandb=true num_train_frames=5000000 replay_buffer_num_workers=5 batch_size=256 agent.train_eps_min=0.2 agent.train_eps_decay_steps=1000000 exp_name=EXPL_RUN env_spec.housekeep_ep_num=2 expl_agent_path=ADD_PATH_TO_PRETRAINED_MODEL_HERE
Download these to the project directory if you want to run with learned captioners.
(Not recommended for most experiments, since they slow things down significantly and degrade performance).