RL Gobang

Kickstart

bazel build -c opt //mcts:capi
pip install -r requirements.txt

Training

Training uses multiple self-play processes and a single training process. Each self-play process runs MCTS search with neural network oracle to generate self-play games. Then these games will be transmitted to the training process with IPC to improve the neural network oracle. The training process also features a evaluator to check whether the new neural network is better than the previous check point. The evaluation criterion is that A is better than B only if A wins B with white stone (defensive position). Only when a definitely stronger network has arisen, new check point will be saved.

The master executable controls the training and self-play processes. It decides the number of self-play processes, which is decisive to the generating speed. master always finishes immediately since it only creates and terminates the worker processes. The worker processes (training/self-playing) exists in the form of daemons. So it may need additional efforts to deploy the program to Windows systems.

# start training
python src/master.py start

# stop training
python src/master.py kill

Playing with Checkpoints

gobang_env is a GUI program to visualize the level of certain checkpoint. It also contains functions to save the playing history to images.

python src/gobang_env.py

Available checkpoints:

Onedrive
BaiduYun with share code: qthq

Achievements

Beats tito (an AI who achieved 3 first and 2 second place in Gomocup) with black stone.

Gets high rank at the platform "微信小程序-欢乐五子棋腾讯版" (in progress).

Performance Notes

The following optimizations are extremely useful in the training process.

Removing Python runtime in MCTS search with a native library (C++/Rust) brings a speedup of roughly 30.
Multi-process can be utilized to accelerate the training procedure by the number of self-play processes, which is bottleneck of the whole system.
Virtual loss is a critical optimization trick to batch the neural network inference during MCTS self-play. It can easily speedup self-play by about 6 times.

Paper

AlphaZero

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
imgs		imgs
mcts		mcts
resources		resources
src		src
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
WORKSPACE		WORKSPACE
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Gobang

Kickstart

Training

Playing with Checkpoints

Achievements

Performance Notes

Paper

About

Releases

Packages

Languages

tigert1998/rl-gobang

Folders and files

Latest commit

History

Repository files navigation

RL Gobang

Kickstart

Training

Playing with Checkpoints

Achievements

Performance Notes

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages