C++ framework for real-time object detection, supporting multiple deep learning backends and input sources. Run state-of-the-art object detection models on video streams, video files, or images with configurable hardware acceleration.
🚧 Status: Under Development — expect frequent updates.
- Multiple Object Detection Models: Supported via vision-core library (YOLOv4-v12, RT-DETR v1/v2/v4, D-FINE, DEIM v1/v2, RF-DETR)
- Switchable Inference Backends: OpenCV DNN, ONNX Runtime, TensorRT, Libtorch, OpenVINO, Libtensorflow (via neuriplo library)
- Real-time Video Processing: Multiple video backends via VideoCapture library (OpenCV, GStreamer, FFmpeg)
- Docker Deployment Ready: Multi-backend container support
- CMake (≥ 3.15)
- C++17 compiler (GCC ≥ 8.0)
- OpenCV (≥ 4.6)
apt install libopencv-dev
- Google Logging (glog)
apt install libgoogle-glog-dev
This project automatically fetches:
- vision-core - Contains pre/post-processing and model logic.
- neuriplo - Provides inference backend abstractions and version management.
- videocapture - Handles video I/O.
For the selected inference backends, set up the required dependencies first:
-
ONNX Runtime:
./scripts/setup_dependencies.sh --backend onnx_runtime
-
TensorRT:
./scripts/setup_dependencies.sh --backend tensorrt
-
LibTorch (CPU only):
./scripts/setup_dependencies.sh --backend libtorch --compute-platform cpu
-
LibTorch with GPU support:
./scripts/setup_dependencies.sh --backend libtorch --compute-platform cuda # Note: Automatically set CUDA version from `versions.neuriplo.env` -
OpenVINO:
./scripts/setup_dependencies.sh --backend openvino
-
TensorFlow:
./scripts/setup_dependencies.sh --backend tensorflow
-
All backends:
./scripts/setup_dependencies.sh --backend all
mkdir build && cd build
cmake -DDEFAULT_BACKEND=<backend> -DCMAKE_BUILD_TYPE=Release ..
cmake --build .The VideoCapture library supports multiple video processing backends with the following priority:
- FFmpeg (if
USE_FFMPEG=ON) - Maximum format/codec compatibility - GStreamer (if
USE_GSTREAMER=ON) - Advanced pipeline capabilities - OpenCV (default) - Simple and reliable
# Enable GStreamer support
cmake -DDEFAULT_BACKEND=<backend> -DUSE_GSTREAMER=ON -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
# Enable FFmpeg support
cmake -DDEFAULT_BACKEND=<backend> -DUSE_FFMPEG=ON -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
# Enable both (FFmpeg takes priority)
cmake -DDEFAULT_BACKEND=<backend> -DUSE_GSTREAMER=ON -DUSE_FFMPEG=ON -DCMAKE_BUILD_TYPE=Release ..
cmake --build .Replace <backend> with one of the supported options. See Dependency Management Guide for complete list and details.
cmake -DENABLE_APP_TESTS=ON .../object-detection-inference \
[--help | -h] \
--type=<model_type> \
--source=<input_source> \
--labels=<labels_file> \
--weights=<model_weights> \
[--min_confidence=<threshold>] \
[--batch|-b=<batch_size>] \
[--input_sizes|-is='<input_sizes>'] \
[--use-gpu] \
[--warmup] \
[--benchmark] \
[--iterations=<number>]-
--type=<model_type>: Specifies the type of object detection model to use. Possible values:yolov4: YOLOv4/YOLOv4-tiny modelsyolo: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLO11, YOLOv12 modelsyolov10: YOLOv10 models (different postprocessing)yolonas: YOLO-NAS modelsrtdetr: RT-DETR, RT-DETRv2, RT-DETRv4, D-FINE, DEIM, DEIMv2 modelsrtdetrul: RT-DETR Ultralytics implementationrfdetr: RF-DETR models
-
--source=<input_source>: Defines the input source for the object detection. It can be:- A live feed URL, e.g.,
rtsp://cameraip:port/stream - A path to a video file, e.g.,
path/to/video.format - A path to an image file, e.g.,
path/to/image.format
- A live feed URL, e.g.,
-
--labels=<path/to/labels/file>: Specifies the path to the file containing the class labels. This file should list the labels used by the model, with each label on a new line. -
--weights=<path/to/model/weights>: Defines the path to the file containing the model weights.
-
[--min_confidence=<confidence_value>]: Sets the minimum confidence threshold for detections. Detections with a confidence score below this value will be discarded. The default value is0.25. -
[--batch | -b=<batch_size>]: Specifies the batch size for inference. Default value is1, inference with batch size bigger than 1 is not currently supported. -
[--input_sizes | -is=<input_sizes>]: Input sizes for each model input when models have dynamic axes or the backend can't retrieve input layer information (like the OpenCV DNN module). Format:CHW;CHW;.... For example:'3,224,224'for a single input'3,224,224;3,224,224'for two inputs'3,640,640;2'for RT-DETR/RT-DETRv2/D-FINE/DEIM/DEIMv2 models
-
[--use-gpu]: Activates GPU support for inference. This can significantly speed up the inference process if a compatible GPU is available. Default isfalse. -
[--warmup]: Enables GPU warmup. Warming up the GPU before performing actual inference can help achieve more consistent and optimized performance. This parameter is relevant only if the inference is being performed on an image source. Default isfalse. -
[--benchmark]: Enables benchmarking mode. In this mode, the application will run multiple iterations of inference to measure and report the average inference time. This is useful for evaluating the performance of the model and the inference setup. This parameter is relevant only if the inference is being performed on an image source. Default isfalse. -
[--iterations=<number>]: Specifies the number of iterations for benchmarking. The default value is10.
./object-detection-inference --help# YOLOv8 Onnx Runtime image processing
./object-detection-inference \
--type=yolo \
--source=image.png \
--weights=models/yolov8s.onnx \
--labels=data/coco.names
# YOLOv8 TensorRT video processing
./object-detection-inference \
--type=yolo \
--source=video.mp4 \
--weights=models/yolov8s.engine \
--labels=data/coco.names \
--min_confidence=0.4
# RTSP stream processing using RT-DETR Ultralytics implementation
--type=rtdetrul \
--source="rtsp://camera:554/stream" \
--weights=models/rtdetr-l.onnx \
--labels=data/coco.names \
--use-gpuCheck the .vscode folder for other examples.
Inside the project, in the Dockerfiles folder, there will be a dockerfile for each inference backend (currently onnxruntime, libtorch, tensorrt, openvino)
# Build for specific backend
docker build --rm -t object-detection-inference:<backend_tag> \
-f docker/Dockerfile.backend .Replace the wildcards with your desired options and paths:
docker run --rm \
-v<path_host_data_folder>:/app/data \
-v<path_host_weights_folder>:/weights \
-v<path_host_labels_folder>:/labels \
object-detection-inference:<backend_tag> \
--type=<model_type> \
--weights=<weight_according_your_backend> \
--source=/app/data/<image_or_video> \
--labels=/labels/<labels_file>For GPU support, add --gpus all to the docker run command.
- Detector Architectures Guide
- Supported Models
- Model Export Guide
- Vision-Core Export Tools - Comprehensive export utilities for all supported models
- Windows builds not currently supported
- Some model/backend combinations may require specific export configurations
- https://paperswithcode.com/sota/real-time-object-detection-on-coco (No more available)
- https://leaderboard.roboflow.com/
- Open an issue for bug reports or feature requests: contributions, corrections, and suggestions are welcome to keep this repository relevant and useful.
- Check existing issues for solutions to common problems