This repository documents the training process of the neural nets for the backgammon engine wildbg.
Each folder contains rollout data and the neural net that was trained on that data. The first rollout happened with random moves. Later rollouts used previous nets.
There are some issues with the rollout data:
- Some position IDs in the folders 0009, 0010, 0012 and 0015 have been encoded wrongly. See carsten-wenderdel/wildbg#27
- 0015 contains rollout data for race positions. Due to poor choice of positions training on that data will lead to worse nets that previous data.
Data | Remarks |
---|---|
0001 | Trained on rollouts with random moves |
0002 | |
0003 | Use tanh instead of sigmoid for inner layer |
0004 | |
0005 | Increased number of epochs from 10 to 20. Increased learning rate from 0.1 to 4.0 |
0006 | |
0007 | The 7th net is actually a bit worse than the 6th net. Less wins, less backgammon wins, but more gammon wins. |
0008 | No new rollouts were done. Instead this net was trained on the combined rollout data of iteration 6 and 7. The network topology has been changed from one hidden layer with tanh activation to three hidden layers with ReLu activation. |
0009 | Rollouts were done with the net from iteration 8. We now have different sets of data for contact and race positions, two different networks and also two different number of inputs. Combined they are better than the 8th iteration, but lose a lot of backgammons because the contact network is too optimistic. It will avoid going into a race and then loses backgammon instead. |
0010 | Rollouts were done with the nets from iteration 9. Only a contact network was rolled out and trained. The loss function for training was changed from MSELoss to L1Loss. |
0011 | No new rollouts; using the same training data as 0010. Instead of the PyTorch optimizer SGD here we used Adam . When duelling with the previous net, this results in an equity win of roughly 0.02. |
0012 | Increased number of epochs from 20 to 50. The race data was rolled out with race #9 and contact #11 (each most recent). The contact data was rolled out with race #12 and contact #12. For the contact net we switched from Adam to AdamW, seems to be a small improvement. Overall dramatic improvement over the previous nets, they now only lose 0.7% backgammon. |
0013 | No new rollouts. The contact net is using Hardsigmoid instead of ReLU as activation function. It seems to give slightly better results (equity win 0.01), but inference also takes a bit longer. |
0014 | No new rollouts. The contact net is trained on the same data as the two previous ones, just a few more epochs, a slightly smaller learning rate and more careful comparisons between different onnx files. Equity win of 0.01. |