An educational project with modules for creating a POMDP (Partially Observable Markov Decision Process) model, implementing and running POMDP solver algorithms. This package was developed during my bachelor thesis to help study POMDP and its solvers.
Python version >=3.5.2. To install dependencies, simply do following:
pip install -r requirements.txt
For easier construction of a POMDP environment, POMDP File Grammar is used to encode environment dynamics. Examples of environments can be found in the 'environments' folder. You could also create a new one as long as it complies with the POMDP file conversions.
- RockSample-7x8.POMDP: Semantic explanation can be found here
- Tiger-2D.POMDP: Standard Tiger Problem.
- Tiger-3D.POMDP: 3-Door version of the Tiger Problem. The main difference is that the agent now has to choose one of the doors to listen if it doesn't want to open a door. Then the observation is given depending on how far away is the tiger when the agent puts its ear against the door.
- GridWorld.POMDP:
- A simple 2D grid environment where the agent can only move left, right or halt at the current position. The rewarding states are at the two end states and the attempt to moving out of the grid edge causes a penalty,
- A much more general 2D grid world environment can be generated using /environments/grid_world_maker.py. Check out /environments/grid_world_example.py for how to use it.
This package has implemented PBVI (Point-Based Value Iteration) and POMCP (Partially Observable Monte Carlo Planning). Variable names follows the notations used in the original paper so a read-through of papers would be encouraged.
Solver algorithms extend the blueprint class 'POMDP' and are managed by the PomdpRunner. The runner class reads algorithm configurations in the 'configs' folder, creates the environment model, and use those elements to create an actual POMDP solver.
usage: main.py [-h] [--env ENV] [--budget BUDGET] [--snapshot SNAPSHOT]
[--logfile LOGFILE] [--random_prior RANDOM_PRIOR]
[--max_play MAX_PLAY]
config
Solve pomdp
positional arguments:
config The file name of algorithm configuration (without JSON
extension)
optional arguments:
-h, --help show this help message and exit
--env ENV The name of environment's config file
--budget BUDGET The total action budget (defeault to inf)
--snapshot SNAPSHOT Whether to snapshot the belief tree after each episode
--logfile LOGFILE Logfile path
--random_prior RANDOM_PRIOR
Whether or not to use a randomly generated
distribution as prior belief, default to False
--max_play MAX_PLAY Maximum number of play steps (episodes)
* Example usage:
> python main.py pomcp --env Tiger-3D.POMDP --budget 10
- Use POMDPX instead of POMDP file grammar. POMDPX is a much more concise grammar for defining a POMDP environment.
- PomdpParser is carrying too much responsibility — needs to be refactored.
- Configuration implementation still looks a bit messy.