-
Install Caffe with opencv2 (opencv is also neeeded for this version of gym-torcs) https://github.com/BVLC/caffe
-
install Torcs dependencies 1.0 remove old gym-torc if necessary:
- make distclean
1.1 xautomation (http://linux.die.net/man/7/xautomation)
- apt-get install xautomation
1.2 OpenAI-Gym (https://github.com/openai/gym)
- apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
- pip install gym
-
Install gym-torcs 2.1 install vtorcs-RL-color dependencies
- sudo apt-get install libglib2.0-dev libgl1-mesa-dev libglu1-mesa-dev freeglut3-dev libplib-dev libopenal-dev libalut-dev libxi-dev libxmu-dev libxrender-dev libxrandr-dev libpng12-dev
2.2 install gym-torcs
- cd /gym-torcs/ command:
- SRC_DIR=proto DST_DIR=. CPP_DST_DIR=vtorcs-RL-color/src/drivers/scr_server/; protoc -I=$SRC_DIR --cpp_out=$CPP_DST_DIR --python_out=$DST_DIR $SRC_DIR/car_state.proto
2.2.1 install vtorcs-RL-color
- cd /gym-torcs/vtorcs-RL-color
- ./configure
- make -j8
- if "no ossspec.h" or "no tgfclien.h" errors just restart make
- make install
- make datainstall
2.3 Initialize race
-
sudo torcs Configure driver: Race --> Quick Race --> Configure Race-->(Select track)Accept -->On the left column Deselect "Damned" driver
Quit all menues
To temporary increase resolution if font is too small before configuring driver Option-->Display-->choose resolution and affter configuring driver Option-->Display-->choose Screen resolution 256x192
-
export Caffe and CUDA paths
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/lib
- export PYTHONPATH=/home/sergey/repo/caffe/python:$PYTHONPATH
- Download models from this link and put them into working directory specified in current_dir in dqn.py
- r18nb.caffemodel is original resnet-18 model trained on the imagenet. Batch normalization layers merged into convolutions
- q_solver.caffemodel is dqn current training model
- qq_target.caffemodel is dqn taret model, running avergae of training model
Protofiles for models are in the directory resnet_torcs
- critic_batch_dqn.prototxt for qq_target.caffemodel
- dqn_critic_solver.prototxt for q_solver.caffemodel
- critic_deploy_dqn.prototxt is protofile for driving agent
- resnet18_nbn.prototxt is not used in project, it's prtofile for resnet-18 with merged BN
For general description check this blog post How to start:
-
in dqn.py you should modify:
- you current directory, this is the directory where your initial models are in the line 36: current_dir = '/home/sergey/repo/gym-torcs/'
- chose replay or training in the line 35, False for training play = False
- chose to start form imagenet renet-18 model (r18nb.caffemodel), line 509 first_run = True
- or from saved dqn network first_run = False
- delete and reinitialzie replay buffer, line 510: reset_buffer = True
- save model and replay buffer in nnn iterations, line 533 save_in_iters = nnn
- max number of steps per track, line 532 (each step take at least 0.4 second) max_steps = 5000
-
in ../resnet_torcs/dqn_critic_solver.prototxt replace net: parameter with address of critic_dqn.prototxt, for example net: /home/user/public_oss/Gym-Torcs-DQN/resnet_torcs/critic_dqn.prototxt
-
run python dqn.py
-
Important parameters
- size of replay buffer, line 21 BUFFER_SIZE = 70000
- gamma and tau, line 514,515 (explained in blog post) GAMMA =0.99 TAU = .0001
- steering action amplitude, line 26 (require retraining): DELTA_A = .85
- exploration noise, line 518-520 initial value of noise (for starting training recommended 0.5, for continuation of training recommended 0.05) act_noise_init = .5 minimal value of noise act_noise_final = .01 inverse scale factor act_noise_interval = 100000
- action smoother and limiter, line 528-529 (explained in blog post) action_smoother = .33 action_limiter = .33
- avergae speed and speed variance, line 560-563 c_speed = 35 delta_speed = 5
-
Useful parameters
- switch speed randomly after swith_count_min(1 + uniform(0,1)) steps swith_count_min = 50
- n-step dqn (see blog post) with n = max_n_batches*BATCH_SIZE. max_n_batches = 16
- n_step ratio - use n-step with n_step_ratio probaility and normal dqn for the rest. For normal dqn use n_step_ratio = 0 n_step_ratio = 0.75
- priority sampling with k-try (see blog post) k_priority_try = 2
- stop episod after max_errors max_errors = 50
-
Risky parameters
- Q function spatial TV-L2 regularization, may make action smoother. Should be very small. lambda_spatial_q = 0
- parameter for TD-Lambda line 437 (see blog post), lesser alpha is only for very small exploration noise alpha = 1
Acknowledgement: Caffe is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors Naoto Yoshida is the author of the gym torcs. Implementation of replay buffer is based on Ben Lau work.