YOLOv9-QAT  TensorRT Q/DQ: Improved Speed and Zero Loss Accuracy

This is outdated 
follow this new repo
https://github.com/levipereira/yolov9-qat

Please follow The Original Implementation in https://github.com/WongKinYiu/yolov9/issues/327

@WongKinYiu

I have developed the initial version of YOLOv9-QAT using the Q/DQ method, tailored specifically for YOLOv9 models intended for execution solely on TensorRT. <br>
This implementation currently supports only the Inference Models (Converted and Gelan models).

The source code in available  the [yolov9-qat](https://github.com/levipereira/yolov9/tree/yolov9-qat) branch.


## Challenges
Quantizing all layers in some cases can decreases accuracy and increases latency, primarily due to the complexity of the last layer. To mitigate this, utilize the `qat.py quantize --no-last-layer` flag to exclude the last layer from quantization. 

This version we have unoptimized scaling of Quantize/Dequantize (Q/DQ) could lead to generating unnecessary data formats. Implementing restrictions on the scale of Q/DQ on [models/quantize.py](https://github.com/levipereira/yolov9/blob/yolov9-qat/models/quantize.py)   to match the data format is essential to decrease latency perfomance.
The contributions from the community, as their knowledge is essential for the correct implementation of this functionality.


## Files Added / Modified 
[qat.py](https://github.com/levipereira/yolov9/blob/yolov9-qat/qat.py) - Main 
```
usage: qat.py [-h] {quantize,sensitive,eval} ...
positional arguments:
  {quantize,sensitive,eval}
    quantize            PTQ/QAT finetune ...
    sensitive           Sensitive layer analysis
    eval                Do evaluate
```

[models/quantize.py](https://github.com/levipereira/yolov9/blob/yolov9-qat/models/quantize.py) - Quantize Module
[models/quantize_rules.py](https://github.com/levipereira/yolov9/blob/yolov9-qat/models/quantize_rules.py) - Quantize Rules
[export.py](https://github.com/levipereira/yolov9/blob/yolov9-qat/export.py) - Changed to Automatically detect QAT Models and Export when using flag `--include onnx  / onnx_end2end`
 
  
 # Accuracy Report
 
 ```
 QAT YOLOV9-C - ALL LAYERS 
Eval Model | AP       | AP50     | Precision  | Recall
-------------------------------------------------------
Origin     | 0.5297   | 0.699    | 0.7432     | 0.634
PQT        | 0.5295   | 0.6978   | 0.7455     | 0.6306
QAT- Best  | 0.5291   | 0.6978   | 0.7449     | 0.632

QAT - YOLOV9-C  - NO QAT LAST LAYER 
Eval Model | AP       | AP50     | Precision  | Recall  
-------------------------------------------------------
Origin     | 0.5297   | 0.699    | 0.7432     | 0.634   
PQT        | 0.529    | 0.698    | 0.7459     | 0.6297  
QAT- Best  | 0.5299   | 0.6984   | 0.7469     | 0.6305  

QAT - YOLOV9-E ALL-LAYERS
Eval Model | AP       | AP50     | Precision  | Recall
-------------------------------------------------------
Origin     | 0.5576   | 0.7246   | 0.7547     | 0.6649
PQT        | 0.5565   | 0.7241   | 0.7499     | 0.6649
QAT- Best  | 0.5566   | 0.7232   | 0.7538     | 0.6637


QAT - YOLOV9-E  - NO QAT  LAST LAYER
Eval Model | AP       | AP50     | Precision  | Recall  
-------------------------------------------------------
Origin     | 0.5576   | 0.7246   | 0.7547     | 0.6649  
PQT        | 0.5569   | 0.7242   | 0.7497     | 0.6646  
QAT- Best  | 0.5569   | 0.7239   | 0.7486     | 0.6657  



 ```
 
Result using TensorRT engine Models on Triton-Server
Tool: https://github.com/levipereira/triton-client-yolo
 ```
 ========================= EVALUATION SUMMARY - YOLOV9-C ========================
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.528
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.701
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.577
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.361
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.582
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.392
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.652
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.538
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.759
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.848
================================================================================
mAP@0.5:0.95: 0.528
mAP@0.5:      0.701
mAP@0.75:     0.577
================================================================================


========================= EVALUATION SUMMARY - YOLOV9-C-QAT ========================
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.528
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.699
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.576
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.581
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.392
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.651
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.699
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.534
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.758
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.845
================================================================================
mAP@0.5:0.95: 0.528
mAP@0.5:      0.699
mAP@0.75:     0.576
================================================================================
```

# Latency Report
- Device Properties:
  * Selected Device: NVIDIA GeForce RTX 4090
    - Compute Capability: 8.9
    - SMs: 128.0
    - Compute Clock Rate: 2.58
    - Device Global Memory: 24207 MiB
    - Shared Memory per SM: 100 KiB
    - Memory Bus Width: 384.0
    - Memory Clock Rate: 10.501

Table Info:
*  "Average time": refers to the sum of the layer latencies, when profiling layers separately.
* "Throughput": is measured in inferences per second (IPS).
## Origin
| Model    | Precision Type | Batch Size | Layers | Weights (MB) | Activations (MB) | Throughput (IPS) | Total Throughput (IPS) | Average time (ms) |
|----------|----------------|------------|--------|---------------|-------------------|------------------|------------------------|-------------------|
| yolov9-c | FP16           | 1          | 271    | 48.2          | 611.7             | 792              | 792                    | 2.1               |
|   |           | 8          | 273    | 48.2          | 4809.1            | 151           | 1209                | 7.3             |
|          |                |            |        |               |                   |                  |                        |                   |
| yolov9-e | FP16           | 8          | 477    | 109.3         | 13461.3           | 57           | 457                 | 18.8             |
|   |            | 1          | 487    | 109.3         | 1706.5            | 353              | 353                    | 4.3               |




##  Last Layer not Quantized 

| Model        | Precision Type | Batch Size | Layers | Weights (MB) | Activations (MB) | Throughput (IPS) | Total Throughput (IPS) | Average time (ms) |
|--------------|----------------|------------|--------|---------------|-------------------|------------------|------------------------|-------------------|
| yolov9-c-qat | FP16 INT8      | 1          | 288    | 29.4          | 534.7             | 951              | 951                    | 1.9               |
|              |                | 8          | 287    | 29.4          | 4190.2            | 181              | 1447                   | 6.4               |
|              |                |            |        |               |                   |                  |                        |                   |
| yolov9-e-qat | FP16 INT8      | 1          | 526    | 63.1          | 1757.0            | 405              | 405                    | 4.1               |
|              |                | 8          | 526    | 63.1          | 13407.7           | 60               | 482                    | 18.2              |



 ##  All Layers Quantized 
| Model        | Precision Type | Batch Size | Layers | Weights (MB) | Activations (MB) | Throughput (IPS) | Total Throughput (IPS) | Average time (ms) |
|--------------|----------------|------------|--------|---------------|-------------------|------------------|------------------------|-------------------|
| yolov9-c-qat | FP16 INT8      | 1          | 295    | 24.2          | 540.1             | 957              | 957              | 1.9               |
|   |       | 8          | 293    | 24.2          | 4216.7            | 193              | 1547                   | 6.1               |
|              |                |            |        |               |                   |                  |                        |                   |
| yolov9-e-qat | FP16 INT8      | 1          | 532    | 57.8          | 1779.5            | 396              | 396                    | 4.1               |
|   |       | 8          | 532    | 57.8          | 13431.8           | 62               | 493                    | 17.8              |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

YOLOv9-QAT TensorRT Q/DQ: Improved Speed and Zero Loss Accuracy #253

Challenges

Files Added / Modified

Accuracy Report

Latency Report

Origin

Last Layer not Quantized

All Layers Quantized

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model	Precision Type	Batch Size	Layers	Weights (MB)	Activations (MB)	Throughput (IPS)	Total Throughput (IPS)	Average time (ms)
yolov9-c	FP16	1	271	48.2	611.7	792	792	2.1
		8	273	48.2	4809.1	151	1209	7.3

yolov9-e	FP16	8	477	109.3	13461.3	57	457	18.8
		1	487	109.3	1706.5	353	353	4.3

Model	Precision Type	Batch Size	Layers	Weights (MB)	Activations (MB)	Throughput (IPS)	Total Throughput (IPS)	Average time (ms)
yolov9-c-qat	FP16 INT8	1	288	29.4	534.7	951	951	1.9
		8	287	29.4	4190.2	181	1447	6.4

yolov9-e-qat	FP16 INT8	1	526	63.1	1757.0	405	405	4.1
		8	526	63.1	13407.7	60	482	18.2

Model	Precision Type	Batch Size	Layers	Weights (MB)	Activations (MB)	Throughput (IPS)	Total Throughput (IPS)	Average time (ms)
yolov9-c-qat	FP16 INT8	1	295	24.2	540.1	957	957	1.9
		8	293	24.2	4216.7	193	1547	6.1

yolov9-e-qat	FP16 INT8	1	532	57.8	1779.5	396	396	4.1
		8	532	57.8	13431.8	62	493	17.8

YOLOv9-QAT TensorRT Q/DQ: Improved Speed and Zero Loss Accuracy #253

Description

Challenges

Files Added / Modified

Accuracy Report

Latency Report

Origin

Last Layer not Quantized

All Layers Quantized

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions