This overview is primarily written for maintainers and expert users.
Here we detail the logic and structure for the DLC3.* PyTorch code. Furthermore, we provide many practical examples to illustrate the usage of the code for developers.
High-level API methods are implemented in deeplabcut.pose_estimations_pytorch.apis.
This folder includes methods to train and evaluate models on DeepLabCut projects, and
analyze videos or folders (of images). While some of the methods are implemented to work
directly from DeepLabCut projects (i.e. by specifying the path to the project config
file and the shuffle number), internally they call methods that allow more flexibility.
Thus, they are also ideally suited for developers.
We provide state-of-the-art pose estimation models such as DLCRNet, HRNet, DEKR, BUCTD
and more are coming! Object detection models are also available (and implemented in
deeplabcut.pose_estimations_pytorch.models.detectors).
The deeplabcut.pose_estimations_pytorch.models package contains all components related
to building a model. Models are flexibly build from modular components: backbone,
neck (optional) and head (as discussed below).
You can check available models by running:
import deeplabcut.pose_estimation_pytorch
# Available pose estimation models
print(deeplabcut.pose_estimation_pytorch.available_models())
# Available object detection models
print(deeplabcut.pose_estimation_pytorch.available_detectors())Model architectures are built according to a configuration specified in a yaml file.
This file (named pytorch_cfg.yaml) describes the architecture of the model you want to
train (but also hyperparameters, optimizer, ...). All code to manipulate PyTorch
configuration files is in deeplabcut.pose_estimations_pytorch.config.
To generate a model configuration, you can call make_pytorch_pose_config. Note that
this does not save the configuration to a given filepath - it just returns it as a
dictionary. However, you can save it with write_config.
During a typical DeepLabCut project management workflow, these methods don't need to be
called, as create_training_dataset will create this configuration file and save it to
disk.
from pathlib import Path
import deeplabcut.pose_estimation_pytorch as dlc_torch
project_cfg = { "Task": "mice", ... } # the configuration for your DLC project
pose_config_path = Path("/path/to/my/config/pytorch_cfg.yaml")
model_cfg = dlc_torch.config.make_pytorch_pose_config(
project_config=project_cfg,
pose_config_path=pose_config_path,
net_type="hrnet_w32",
top_down=True,
save=True,
)If you want to add a novel model, you'll ideally build them from the following implemented parts:
- a backbone (such as a ResNet or HRNet)
- a head (such as a HeatmapHead)
- a predictor (transforming model outputs into keypoint locations)
- a target generator (creating the targets for your head outputs from your labels)
Some models can also define a neck (model components between the backbone and the head). You'll also need some loss criterions, but usually you'll be able to use existing ones.
You can either use existing classes and only replace some elements, or rewrite everything you need for your model. We use Model Registries to simplify the process of adding models.
Registries are created for all model building blocks to make it easy to add new models.
All you need to do is add the decorator REGISTRY.register_module to be able to load
your model from a configuration file. Available registries are BACKBONES, NECKS,
HEADS, PREDICTORS and TARGET_GENERATORS. Each building block has a base class
that should be inherited by the class added to the model registry (BaseBackbone,
BaseNeck, BaseHead, BasePredictor and BaseGenerator respectively).
Let's illustrate that with a small example. We'll create a dummy backbone, which simply applies a max-pool to the input:
import torch
import torch.nn.functional as F
from deeplabcut.pose_estimation_pytorch.models.backbones import BACKBONES, BaseBackbone
@BACKBONES.register_module
class DummyBackbone(BaseBackbone):
"""A dummy backbone, simply max-pooling the input"""
def __init__(self, kernel_size: int = 2):
super().__init__(stride=kernel_size)
self.kernel_size = kernel_size
def forward(self, x: torch.Tensor) -> torch.Tensor:
return F.max_pool2d(x, kernel_size=self.kernel_size)
backbone_config = dict(type="DummyBackbone", kernel_size=3)
backbone = BACKBONES.build(backbone_config) # will create a DummyBackboneAnother example would be creating a custom head for our model. In this case, let's make
a head which takes as input the output of a backbone (which has shape (num_channels, H', W')) and put it through a kernel-size 1 convolution, simply changing the number of
channels.
Heads can output multiple tensors (such as heatmaps and location refinement fields).
Therefore, their forward(...) method outputs a dictionary mapping strings to tensors.
Here, we return the heatmap and locref tensors.
A head must contain different: a target_generator to generate targets for
its outputs and a predictor to convert model outputs to pose. Make sure that the keys
output by the target_generator and the head match! Some criterion also needs to be
defined to compute the loss between the outputs and targets. When more than one output
is specified (such as in this case, where we're generating heatmaps and location
refinement fields), a loss aggregator must also be given to combine all losses into one
(this should simply be a WeightedLossAggregator, indicating the weight for each loss).
import torch
import torch.nn as nn
from deeplabcut.pose_estimation_pytorch.models.criterions import (
BaseCriterion,
BaseLossAggregator,
WeightedHuberCriterion,
WeightedLossAggregator,
WeightedMSECriterion,
)
from deeplabcut.pose_estimation_pytorch.models.heads import HEADS, BaseHead
from deeplabcut.pose_estimation_pytorch.models.predictors import (
BasePredictor,
HeatmapPredictor,
)
from deeplabcut.pose_estimation_pytorch.models.target_generators import (
BaseGenerator,
HeatmapGaussianGenerator,
)
@HEADS.register_module
class DummyHead(BaseHead):
"""A dummy backbone, simply max-pooling the input"""
def __init__(
self,
num_input_channels: int,
num_bodyparts: int,
predictor: BasePredictor,
target_generator: BaseGenerator,
criterion: dict[str, BaseCriterion],
aggregator: BaseLossAggregator,
):
super().__init__(
stride=1,
predictor=predictor,
target_generator=target_generator,
criterion=criterion,
aggregator=aggregator
)
self.conv_heatmap = nn.Conv2d(
in_channels=num_input_channels,
out_channels=num_bodyparts,
kernel_size=1,
stride=1,
)
self.locref_heatmap = nn.Conv2d(
in_channels=num_input_channels,
out_channels=2 * num_bodyparts,
kernel_size=1,
stride=1,
)
def forward(self, x: torch.Tensor) -> dict[str, torch.Tensor]:
return {
"heatmap": self.conv_heatmap(x),
"locref": self.locref_heatmap(x),
}
head_config = dict(
type="DummyHead",
num_input_channels=2048,
num_bodyparts=5,
predictor=HeatmapPredictor(location_refinement=True, locref_std= 7.2801),
target_generator=HeatmapGaussianGenerator(
num_heatmaps=5,
pos_dist_thresh=17,
heatmap_mode=HeatmapGaussianGenerator.Mode.KEYPOINT,
generate_locref=True,
),
criterion={
"heatmap": WeightedMSECriterion(),
"locref": WeightedHuberCriterion(),
},
aggregator=WeightedLossAggregator(weights={"heatmap": 1, "locref": 0.05}),
)
head = HEADS.build(head_config)The deeplabcut.pose_estimations_pytorch.data package contains all code for PyTorch
dataset creation and test/train splitting. The DLCLoader class is used to load the
labeled data for a specific shuffle.
import deeplabcut.pose_estimation_pytorch as dlc_torch
loader = dlc_torch.DLCLoader(
config="/path/to/my/project/config.yaml",
trainset_index=0,
shuffle=1,
)
# print the path to the model folder (where the config file is stored)
print(loader.model_folder)
# print the path to the evaluation folder
print(loader.evaluation_folder)
# display the DataFrame containing the dataset
print(loader.df)
# display the DataFrames containing the train/test data respectively
print(loader.df_train)
print(loader.df_test)The PoseDataset class is an instance of
torch.utils.Dataset, which converts raw
images and keypoints to a tensor dataset for training and evaluation. You can generate
an instance of training/test dataset with your DLCLoader:
import deeplabcut.pose_estimation_pytorch as dlc_torch
loader = dlc_torch.DLCLoader(
config="/path/to/my/project/config.yaml",
trainset_index=0,
shuffle=1,
)
train_dataset = loader.create_dataset(
transform=dlc_torch.build_transforms(loader.model_cfg["data"]["train"]),
mode="train",
task=loader.pose_task,
)
valid_dataset = loader.create_dataset(
transform=dlc_torch.build_transforms(loader.model_cfg["data"]["inference"]),
mode="test",
task=loader.pose_task,
)A COCOLoader is also available, and allows you train models in DeepLabCut on
COCO-format
datasets. This essentially consists of having a folder containing your dataset in the
format:
COCOProject
└───annotations
│ │ train.json
│ │ test.json
│
└───images
│ img0000.png
│ img0001.png
│ ...
In your train.json and test.json files, you can either specify your image
"file_name" with a relative path or with an absolute path. If a relative path is
used (e.g. img0000.png or subfolder/img0000.png), it will be resolved to the
images folder in your project (i.e. /path/to/COCOProject/images/img0000.png or
/path/to/COCOProject/images/subfolder/img0000.png).
If you specify an absolute path, the path to the image will not be resolved, and the image will be loaded from the specified path. This allows you to keep data on different disks, or reuse the same images in different projects without having to duplicate them.
To train a DeepLabCut model on a COCO-format dataset, you'll need to specify a model configuration file (as described in [#model_configuration_files]).
from pathlib import Path
import deeplabcut.pose_estimation_pytorch as dlc_torch
# Specify project paths
project_root = Path("/path/to/my/COCOProject")
train_json_filename = "train.json"
test_json_filename = "test.json"
# Parse information about the project
train_dict = dlc_torch.COCOLoader.load_json(project_root, filename=train_json_filename)
max_num_individuals, bodyparts = dlc_torch.COCOLoader.get_project_parameters(train_dict)
# Generate a configuration file for your PyTorch model
# In this case, it's for a Top-Down HRNet_w32
experiment_path = project_root / "experiments" / "hrnet_w32"
model_cfg_path = experiment_path / "train" / "pytorch_cfg.yaml"
model_cfg = dlc_torch.config.make_pytorch_pose_config(
project_config=dlc_torch.config.make_basic_project_config(
dataset_path=str(project_root.resolve()),
bodyparts=bodyparts,
max_individuals=max_num_individuals,
multi_animal=True,
),
pose_config_path=experiment_path,
net_type="hrnet_w32",
top_down=True,
save=True,
)
# Create the loader for the COCO dataset
loader = dlc_torch.COCOLoader(
project_root=project_root,
model_config_path="/path/to/my/project/experiments/pytorch_config.yaml",
train_json_filename=train_json_filename,
test_json_filename=test_json_filename,
)
train_dataset = loader.create_dataset(
transform=dlc_torch.build_transforms(loader.model_cfg["data"]["train"]),
mode="train",
task=loader.pose_task,
)
valid_dataset = loader.create_dataset(
transform=dlc_torch.build_transforms(loader.model_cfg["data"]["inference"]),
mode="test",
task=loader.pose_task,
)The deeplabcut.pose_estimations_pytorch.runners contains code to get models, load
pretrained weights, and either train them or run inference with them.
from pathlib import Path
import deeplabcut.pose_estimation_pytorch as dlc_torch
# Specify project paths
project_root = Path("/path/to/my/COCOProject")
train_json_filename = "train.json"
test_json_filename = "test.json"
loader = dlc_torch.COCOLoader(
project_root=project_root,
model_config_path="/path/to/my/project/experiments/pytorch_config.yaml",
train_json_filename=train_json_filename,
test_json_filename=test_json_filename,
)
dlc_torch.train(
loader=loader,
run_config=loader.model_cfg,
task=dlc_torch.Task(loader.model_cfg["method"]),
device="cuda:2",
logger_config=dict(
type="WandbLogger",
project_name="MyWandbProject",
tags=["model=hrnet_w32"],
),
snapshot_path=None,
)DeepLabCut provides high-level APIs (via the GUI or the python package) to analyze your
data. The usage of this API assumes the existence of a DLC project (with config.yaml
file, etc.).
Sometimes it might be more convenient to just run a model on your data via a low-level API. We also use this API under the hood, in particular for the Model Zoo. Check out the example below:
from deeplabcut.core.config import read_config_as_dict
from pathlib import Path
import deeplabcut.pose_estimation_pytorch as dlc_torch
train_dir = Path("/Users/Jaylen/my-dlc-models/train")
pytorch_config_path = train_dir / "pytorch_config.yaml"
snapshot_path = train_dir / "snapshot-100.pt"
# for top-down models, otherwise None
detector_snapshot_path = train_dir / "detector-snapshot-100.pt"
# video and inference parameters
video_path = Path("/Users/Jaylen/my-dlc-models/videos/test-video.mp4")
max_num_animals = 5
batch_size = 16
detector_batch_size = 8
# read model configuration
model_cfg = read_config_as_dict(pytorch_config_path)
pose_task = dlc_torch.Task(model_cfg["method"])
pose_runner = dlc_torch.get_pose_inference_runner(
model_config=model_cfg,
snapshot_path=snapshot_path,
max_individuals=max_num_animals,
batch_size=batch_size,
)
detector_runner = None
if pose_task == dlc_torch.Task.TOP_DOWN:
detector_runner = dlc_torch.get_detector_inference_runner(
model_config=model_cfg,
snapshot_path=detector_snapshot_path,
max_individuals=max_num_animals,
batch_size=detector_batch_size,
)
predictions = dlc_torch.video_inference(
video=video_path,
pose_runner=pose_runner,
detector_runner=detector_runner,
)When deeplabcut.pose_estimation_pytorch.apis.videos.video_inference is called
with a top-down model, it is assumed that a detector snapshot is given as well to obtain
bounding boxes with which to run pose estimation. It's possible that you've already
obtained bounding boxes for your video (with another object detector or through some
other means), and you want to reuse those bounding boxes instead of running an object
detector again.
You can easily do so by writing a bit of custom code, as shown in the example below:
from deeplabcut.core.config import read_config_as_dict
from pathlib import Path
import numpy as np
import deeplabcut.pose_estimation_pytorch as dlc_torch
from tqdm import tqdm
# create an iterator for your video
video = dlc_torch.VideoIterator("/Users/Jayson/my-cool-video.mp4")
# dummy bboxes - you can load yours from a file or in another way
# the bboxes should be in `xywh` format, i.e. (x_top_left, y_top_left, width, height)
bounding_boxes = [
dict( # frame 0 bounding boxes
bboxes=np.array([[12, 37, 120, 78]]),
),
dict( # frame 1 bounding boxes
bboxes=np.array([[17, 45, 128, 73], [532, 34, 117, 87]]),
),
# ...
dict( # frame N bboxes -> must be equal to the number of frames in the video!
bboxes=np.array([[17, 45, 128, 73], [532, 34, 117, 87]]),
),
]
video.set_context(bounding_boxes)
max_individuals = np.max([len(context["bboxes"]) for context in bounding_boxes])
# run inference!
model_cfg = read_config_as_dict("/Users/Jayson/pytorch_config.yaml")
pose_runner = dlc_torch.get_pose_inference_runner(
model_config=model_cfg,
snapshot_path=Path("/Users/Jayson/model-snapshot.pt"),
max_individuals=max_individuals,
batch_size=32,
)
# your predictions will be a list, containing the predictions made for each frame
# as a dict (with keys for "bodyparts" but also "bboxes")!
predictions = pose_runner.inference(images=tqdm(video))