The Collectable Card Identifier project focuses on generating and managing datasets to train image classifiers for identifying individual cards from various Trading Card Games (TCG) and Collectible Card Games (CCG). Starting with "Pokemon" and expanding to "Magic: The Gathering" and "YuGiOh!", this system can be used for applications like inventory management, automated sorting, and card valuation.
The primary goal is to create a dataset generator that produces a diverse and extensive training dataset through various image transformations. This dataset will support the broader objective of developing a card sorting robot and other related applications.
This project uses Poetry to manage dependencies and requires Python 3.10. After cloning the repository, install the project and its dependencies by running:
poetry installAfter installing dependencies, enable git hooks so style checks and tests run automatically before each commit:
pre-commit installThis will create an isolated virtual environment and install all runtime and
development dependencies. If you prefer not to use Poetry, install the package
with pip and include the [dev] extras:
pip install ".[dev]"Using [dev] installs pytest, ruff, pre-commit, and other development
tools. Either method will make the mkdataset command available in your
environment.
Several environment variables control where datasets and images are stored. They all default to sub-directories of data if not set.
| Variable | Description | Default |
|---|---|---|
CARDIDENT_DATA_ROOT |
Root directory for all data assets. | data |
CARDIDENT_BACKGROUNDS_DIR |
Location of background images. | $CARDIDENT_DATA_ROOT/backgrounds |
CARDIDENT_IMAGES_DIR |
Where original card images are downloaded. | $CARDIDENT_DATA_ROOT/images/originals |
CARDIDENT_DATASETS_DIR |
Destination for generated dataset images. | $CARDIDENT_DATA_ROOT/images/dataset |
CARDIDENT_DEBUG |
Enable debug logging across multiprocessing workers. | 0 |
First ensure card images are downloaded. For Pokémon cards this can be done with:
poetry run mkdataset card-data -t pokemon --imagesGenerate a dataset of 500 images:
poetry run mkdataset create-dataset -t pokemon -n 500All data lives beneath CARDIDENT_DATA_ROOT (defaults to data).
Important subdirectories are:
$CARDIDENT_DATA_ROOT/
backgrounds/ # background images used for dataset generation
barrel/<game>/ # pickled state files and RNG snapshots
images/
originals/<game>/ # downloaded card scans
dataset/<game>/ # generated dataset images
Generated datasets are stored by set and card ID. For example:
$CARDIDENT_DATA_ROOT/images/dataset/pokemon/<set>/<card-id>/*.png
Training symlinks produced by DatasetManager.mk_symlinks are placed in
dataset/<game>/symlinks/<mode> where <mode> is all, legal, or sets.
A typical workflow is:
-
Download card metadata and images:
poetry run mkdataset card-data -t pokemon --refresh --images
-
Generate randomized dataset images (populate
CARDIDENT_BACKGROUNDS_DIRwith background images first):poetry run mkdataset create-dataset -t pokemon -n 500
-
Trim each card directory to the desired size:
poetry run mkdataset trim-dataset -t pokemon -n 200
-
Create symlink trees for training:
from card_identifier.dataset import DatasetManager dm = DatasetManager("pokemon") dm.mk_symlinks("all") # or 'legal'/'sets'
Set CARDIDENT_DEBUG=1 to enable debug messages from all worker processes. The
--debug flag in the CLI sets this variable automatically.
Install the development dependencies first:
pip install -e .[dev]
# or
poetry install --with devThen execute the test suite before committing changes:
poetry run pytest -n autoThe test suite depends on additional packages like pytest-xdist, Pillow, and
pokemontcgsdk. These are included when installing with the [dev] extras.
You can also run all style checks and tests at once with:
pre-commit run --all-filesRun Ruff to check code style and common errors:
poetry run ruff check .