Build an RL (Reinforcement Learning) agent that learns to play Numerical Tic-Tac-Toe. The agent learns the game by Q-Learning.
reinforcement-learning actions q-learning policy episodes convergence epsilon-greedy states rl rewards hyperparameter-tuning learning-rate model-building q-value markov-decision-process q-learning-algorithm epsilon-decay q-value-iteration mdp-framework
-
Updated
Jul 9, 2021 - Jupyter Notebook