Description
This is outdated
follow this new repo
https://github.com/levipereira/yolov9-qat
Please follow The Original Implementation in #327
I have developed the initial version of YOLOv9-QAT using the Q/DQ method, tailored specifically for YOLOv9 models intended for execution solely on TensorRT.
This implementation currently supports only the Inference Models (Converted and Gelan models).
The source code in available the yolov9-qat branch.
Challenges
Quantizing all layers in some cases can decreases accuracy and increases latency, primarily due to the complexity of the last layer. To mitigate this, utilize the qat.py quantize --no-last-layer
flag to exclude the last layer from quantization.
This version we have unoptimized scaling of Quantize/Dequantize (Q/DQ) could lead to generating unnecessary data formats. Implementing restrictions on the scale of Q/DQ on models/quantize.py to match the data format is essential to decrease latency perfomance.
The contributions from the community, as their knowledge is essential for the correct implementation of this functionality.
Files Added / Modified
qat.py - Main
usage: qat.py [-h] {quantize,sensitive,eval} ...
positional arguments:
{quantize,sensitive,eval}
quantize PTQ/QAT finetune ...
sensitive Sensitive layer analysis
eval Do evaluate
models/quantize.py - Quantize Module
models/quantize_rules.py - Quantize Rules
export.py - Changed to Automatically detect QAT Models and Export when using flag --include onnx / onnx_end2end
Accuracy Report
QAT YOLOV9-C - ALL LAYERS
Eval Model | AP | AP50 | Precision | Recall
-------------------------------------------------------
Origin | 0.5297 | 0.699 | 0.7432 | 0.634
PQT | 0.5295 | 0.6978 | 0.7455 | 0.6306
QAT- Best | 0.5291 | 0.6978 | 0.7449 | 0.632
QAT - YOLOV9-C - NO QAT LAST LAYER
Eval Model | AP | AP50 | Precision | Recall
-------------------------------------------------------
Origin | 0.5297 | 0.699 | 0.7432 | 0.634
PQT | 0.529 | 0.698 | 0.7459 | 0.6297
QAT- Best | 0.5299 | 0.6984 | 0.7469 | 0.6305
QAT - YOLOV9-E ALL-LAYERS
Eval Model | AP | AP50 | Precision | Recall
-------------------------------------------------------
Origin | 0.5576 | 0.7246 | 0.7547 | 0.6649
PQT | 0.5565 | 0.7241 | 0.7499 | 0.6649
QAT- Best | 0.5566 | 0.7232 | 0.7538 | 0.6637
QAT - YOLOV9-E - NO QAT LAST LAYER
Eval Model | AP | AP50 | Precision | Recall
-------------------------------------------------------
Origin | 0.5576 | 0.7246 | 0.7547 | 0.6649
PQT | 0.5569 | 0.7242 | 0.7497 | 0.6646
QAT- Best | 0.5569 | 0.7239 | 0.7486 | 0.6657
Result using TensorRT engine Models on Triton-Server
Tool: https://github.com/levipereira/triton-client-yolo
========================= EVALUATION SUMMARY - YOLOV9-C ========================
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.701
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.577
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.361
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.582
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.392
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.652
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.701
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.538
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.759
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.848
================================================================================
[email protected]:0.95: 0.528
[email protected]: 0.701
[email protected]: 0.577
================================================================================
========================= EVALUATION SUMMARY - YOLOV9-C-QAT ========================
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.699
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.576
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.581
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.392
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.651
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.699
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.534
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.758
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.845
================================================================================
[email protected]:0.95: 0.528
[email protected]: 0.699
[email protected]: 0.576
================================================================================
Latency Report
- Device Properties:
- Selected Device: NVIDIA GeForce RTX 4090
- Compute Capability: 8.9
- SMs: 128.0
- Compute Clock Rate: 2.58
- Device Global Memory: 24207 MiB
- Shared Memory per SM: 100 KiB
- Memory Bus Width: 384.0
- Memory Clock Rate: 10.501
- Selected Device: NVIDIA GeForce RTX 4090
Table Info:
- "Average time": refers to the sum of the layer latencies, when profiling layers separately.
- "Throughput": is measured in inferences per second (IPS).
Origin
Model | Precision Type | Batch Size | Layers | Weights (MB) | Activations (MB) | Throughput (IPS) | Total Throughput (IPS) | Average time (ms) |
---|---|---|---|---|---|---|---|---|
yolov9-c | FP16 | 1 | 271 | 48.2 | 611.7 | 792 | 792 | 2.1 |
8 | 273 | 48.2 | 4809.1 | 151 | 1209 | 7.3 | ||
yolov9-e | FP16 | 8 | 477 | 109.3 | 13461.3 | 57 | 457 | 18.8 |
1 | 487 | 109.3 | 1706.5 | 353 | 353 | 4.3 |
Last Layer not Quantized
Model | Precision Type | Batch Size | Layers | Weights (MB) | Activations (MB) | Throughput (IPS) | Total Throughput (IPS) | Average time (ms) |
---|---|---|---|---|---|---|---|---|
yolov9-c-qat | FP16 INT8 | 1 | 288 | 29.4 | 534.7 | 951 | 951 | 1.9 |
8 | 287 | 29.4 | 4190.2 | 181 | 1447 | 6.4 | ||
yolov9-e-qat | FP16 INT8 | 1 | 526 | 63.1 | 1757.0 | 405 | 405 | 4.1 |
8 | 526 | 63.1 | 13407.7 | 60 | 482 | 18.2 |
All Layers Quantized
Model | Precision Type | Batch Size | Layers | Weights (MB) | Activations (MB) | Throughput (IPS) | Total Throughput (IPS) | Average time (ms) |
---|---|---|---|---|---|---|---|---|
yolov9-c-qat | FP16 INT8 | 1 | 295 | 24.2 | 540.1 | 957 | 957 | 1.9 |
8 | 293 | 24.2 | 4216.7 | 193 | 1547 | 6.1 | ||
yolov9-e-qat | FP16 INT8 | 1 | 532 | 57.8 | 1779.5 | 396 | 396 | 4.1 |
8 | 532 | 57.8 | 13431.8 | 62 | 493 | 17.8 |
Activity