- Transfer learning
- YoloV4 configuration
- YoloV4 training
- YoloV4 loss function adjustments.
- Live plot losses
- Command line options
- YoloV3 tiny
- Rasberry Pi support
pip install git+https://github.com/unsignedrant/yolo-tf2
Verify installation
% yolotf2
Yolo-tf2 1.0
Usage:
yolotf2 <command> [options] [args]
Available commands:
train Create new or use existing dataset and train a model
detect Detect a folder of images or a video
Use yolotf2 <command> -h to see more info about a command
Use yolotf2 -h to display all command line options
yolo-tf2 was initially an implementation of yolov3 (you only look once)(training & inference) and support for all yolo versions was added in db2f889. Yolo is a state-of-the-art, real-time object detection system that is extremely fast and accurate. The official repo is here. There are many implementations that support tensorflow, only a few that support tensorflow v2 and as I did not find versions that suit my needs so, I decided to create this version which is very flexible and customizable. It requires python 3.10+, is not platform specific and is MIT licensed.
flags | help | required | default |
---|---|---|---|
--anchors | Path to anchors .txt file | True | - |
--batch-size | Training/detection batch size | - | 8 |
--classes | Path to classes .txt file | True | - |
--input-shape | Input shape ex: (m, m, c) | - | (416, 416, 3) |
--iou-threshold | iou (intersection over union) threshold | - | 0.5 |
--masks | Path to masks .txt file | True | - |
--max-boxes | Maximum boxes per image | - | 100 |
--model-cfg | Yolo DarkNet configuration .cfg file | True | - |
--quiet | If specified, verbosity is set to False | - | - |
--score-threshold | Confidence score threshold | - | 0.5 |
--v4 | If yolov4 configuration is used, this should be specified | - | - |
flags | help | default |
---|---|---|
--dataset-name | Checkpoint/dataset prefix | - |
--delete-images | If specified, dataset images will be deleted upon being saved to tfrecord. | - |
--epochs | Number of training epochs | 100 |
--es-patience | Early stopping patience | - |
--image-dir | Path to folder containing images referenced by .xml labels | - |
--labeled-examples | Path to labels .csv file | - |
--learning-rate | Training learning rate | 0.001 |
--output-dir | Path to folder where training dataset / checkpoints / other data will be saved | . |
--shuffle-buffer-size | Dataset shuffle buffer | 512 |
--train-shards | Total number of .tfrecord files to split training dataset into | 1 |
--train-tfrecord | Path to training .tfrecord file | - |
--valid-frac | Validation dataset fraction | 0.1 |
--valid-shards | Total number of .tfrecord files to split validation dataset into | 1 |
--valid-tfrecord | Path to validation .tfrecord file | - |
--weights | Path to trained weights .tf or .weights file | - |
--xml-dir | Path to folder containing .xml labels in VOC format | - |
flags | help | required | default |
---|---|---|---|
--codec | Codec to use for predicting videos | - | mp4v |
--display-vid | Display video during detection | - | - |
--evaluation-examples | Path to .csv file with ground truth for evaluation of the trained model and mAP score calculation. | - | - |
--image-dir | A directory that contains images to predict | - | - |
--images | Paths of images to detect | - | - |
--output-dir | Path to directory for saving results | - | - |
--video | Path to video to predict | - | - |
--weights | Path to trained weights .tf or .weights file | True | - |
This feature was introduced to replace the old hard-coded model. Models are loaded directly from DarkNet .cfg files for convenience.
As of db2f889 DarkNet .cfg files are automatically converted to keras models.
The current code leverages features that were introduced in tensorflow 2.x including keras models, tfrecord datasets, etc...
Both options are available, and Note in case of using DarkNet weights you must maintain the same number of COCO classes (80 classes) as transfer learning to models with different classes is not currently supported.
There are 3 input options accepted by the api:
A .csv file similar to the one below is supported. Note that x0
, y0
, x1
, y1
are x and y coordinates relative to their corresponding image width and height. For
example:
image width = 1000
image height = 500
x0, y0 = 100, 300
x1, y1 = 120, 320
x0, y0, x1, y1 = 0.1, 0.6, 0.12, 0.64 respectively.
image | object_name | object_index | x0 | y0 | x1 | y1 |
---|---|---|---|---|---|---|
/path/to/368.jpg | Car | 0 | 0.478423 | 0.57672 | 0.558036 | 0.699735 |
/path/to/368.jpg | Car | 0 | 0.540923 | 0.583333 | 0.574405 | 0.626984 |
/path/to/368.jpg | Car | 0 | 0.389881 | 0.574074 | 0.470982 | 0.683862 |
/path/to/368.jpg | Car | 0 | 0.447173 | 0.555556 | 0.497024 | 0.638889 |
/path/to/368.jpg | Street Sign | 1 | 0.946429 | 0.40873 | 0.991815 | 0.510582 |
<annotation>
<folder>VOC2012</folder>
<filename>2007_000033.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
</source>
<size>
<width>500</width>
<height>366</height>
<depth>3</depth>
</size>
<segmented>1</segmented>
<object>
<name>aeroplane</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>9</xmin>
<ymin>107</ymin>
<xmax>499</xmax>
<ymax>263</ymax>
</bndbox>
</object>
<object>
<name>aeroplane</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>421</xmin>
<ymin>200</ymin>
<xmax>482</xmax>
<ymax>226</ymax>
</bndbox>
</object>
<object>
<name>aeroplane</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>325</xmin>
<ymin>188</ymin>
<xmax>411</xmax>
<ymax>223</ymax>
</bndbox>
</object>
</annotation>
.tfrecord files previously generated by the code can be reused. A typical feature map looks like:
{
'image': tf.io.FixedLenFeature([], tf.string),
'x0': tf.io.VarLenFeature(tf.float32),
'y0': tf.io.VarLenFeature(tf.float32),
'x1': tf.io.VarLenFeature(tf.float32),
'y1': tf.io.VarLenFeature(tf.float32),
'object_name': tf.io.VarLenFeature(tf.string),
'object_index': tf.io.VarLenFeature(tf.int64),
}
A k-means algorithm finds the optimal sizes and generates anchors with process visualization.
Including:
- k-means visualization:
- Generated anchors:
- Precision and recall curves:
- Evaluation bar charts:
- Actual vs. detections:
You can always visualize different stages of the program using my other repo labelpix which is a tool for drawing bounding boxes, but can also be used to visualize bounding boxes over images using csv files in the format mentioned here.
Evaluation is available through the detection api which supports mAP score calculation. A typical evaluation result looks like:
object_name | average_precision | actual | detections | true_positives | false_positives | combined | |
---|---|---|---|---|---|---|---|
1 | Car | 0.825907 | 298 | 338 | 275 | 63 | 338 |
12 | Bus | 0.666667 | 3 | 2 | 2 | 0 | 2 |
6 | Palm Tree | 0.627774 | 122 | 93 | 82 | 11 | 93 |
7 | Trash Can | 0.555556 | 9 | 7 | 5 | 2 | 7 |
8 | Flag | 0.480867 | 14 | 8 | 7 | 1 | 8 |
2 | Traffic Lights | 0.296155 | 122 | 87 | 58 | 29 | 87 |
5 | Street Lamp | 0.289578 | 73 | 41 | 28 | 13 | 41 |
3 | Street Sign | 0.287331 | 93 | 52 | 35 | 17 | 52 |
9 | Fire Hydrant | 0.194444 | 6 | 3 | 2 | 1 | 3 |
4 | Pedestrian | 0.183942 | 130 | 56 | 35 | 21 | 56 |
0 | Delivery Truck | 0 | 1 | 0 | 0 | 0 | 0 |
10 | Road Block | 0 | 2 | 7 | 0 | 7 | 7 |
11 | Minivan | 0 | 3 | 0 | 0 | 0 | 0 |
13 | Bicycle | 0 | 4 | 1 | 0 | 1 | 1 |
14 | Pickup Truck | 0 | 2 | 0 | 0 | 0 | 0 |
You can check my other repo labelpix which is a labeling tool that you can use produce small datasets for experimentation. It supports .csv files in the format mentioned here and/or .xml files as here
Detections can be performed on photos or videos using the detection api.
The following files are expected:
-
Object classes .txt file.
person bicycle car motorbike aeroplane bus train truck boat traffic light fire hydrant
-
DarkNet model .cfg file
-
Anchors .txt file
10,13 16,30 33,23 30,61 62,45 59,119 116,90 156,198 373,326
-
Masks .txt file
6,7,8 3,4,5 0,1,2
-
Labeled examples, ONE of:
Training is available through yolo_tf2.train api. For more info about
other parameters, check the docstrings, available through help()
import yolo_tf2
yolo_tf2.train(
input_shape=(608, 608, 3),
classes='/path/to/classes.txt',
model_cfg='/path/to/darknet/file.cfg',
anchors='/path/to/anchors.txt',
masks='/path/to/masks.txt',
labeled_examples='/path/to/labeled_examples.csv',
output_dir='/path/to/training-output-dir'
)
yolotf2 train --input-shape 608 608 3 --classes /path/to/classes.txt --model-cfg /path/to/darknet/file.cfg --anchors /path/to/anchors.txt --masks /path/to/masks.txt --labeled-examples /path/to/labeled_examples.csv --output-dir /path/to/training-output-dir
The following files are expected:
-
Object classes .txt file.
person bicycle car motorbike aeroplane bus train truck boat traffic light fire hydrant
-
DarkNet model .cfg file
-
Anchors .txt file
10,13 16,30 33,23 30,61 62,45 59,119 116,90 156,198 373,326
-
Masks .txt file
6,7,8 3,4,5 0,1,2
-
Trained .tf or .weights file
-
Whatever is to detect: any of:
- A list of image paths
- Image dir
- Video
Note: For yolov4 configuration, v4=True
or --v4
should be specified
Detection is available through yolo_tf2.detect api. For more info about
other parameters, check the docstrings, available through help()
import yolo_tf2
yolo_tf2.detect(
input_shape=(608, 608, 3),
classes='/path/to/classes.txt',
anchors='/path/to/anchors.txt',
masks='/path/to/masks.txt',
model_cfg='/path/to/darknet/file.cfg',
weights='/path/to/trained_weights.tf',
images=['/path/to/image1', '/path/to/image2', ...],
output_dir='detection-output'
)
yolotf2 detect --input-shape 608 608 3 --classes /path/to/classes.txt --model-cfg /path/to/darknet/file.cfg --anchors /path/to/anchors.txt --masks /path/to/masks.txt --weights /path/to/trained_weights.tf --images /path/to/image1 /path/to/image2 --output-dir /path/to/detection-output-dir
Notes:
- To detect video,
video
or--video
needs to be passed instead - For yolov4 configuration,
v4=True
or--v4
should be specified**
Evaluation is available through the very same detection api described in the previous section.
The only difference is an additional parameter evaluation_examples
or --evaluation-examples
for command line which is a .csv file containing the actual labels of the images being detected.
The names of the images passed will be looked for in the actual labels, and if any of the
filenames were not found, an error is raised, which means:
if you do:
import yolo_tf2
yolo_tf2.detect(
input_shape=(608, 608, 3),
classes='/path/to/classes.txt',
anchors='/path/to/anchors.txt',
masks='/path/to/masks.txt',
model_cfg='/path/to/darknet/file.cfg',
weights='/path/to/trained_weights.tf',
images=['/path/to/image1', '/path/to/image2', ...],
output_dir='detection-output',
evaluation_examples='/path/to/actual/examples'
)
evaluation_examples
.csv file should look like:
image | object_name | object_index | x0 | y0 | x1 | y1 |
---|---|---|---|---|---|---|
/path/to/image1 | Car | 0 | 0.478423 | 0.57672 | 0.558036 | 0.699735 |
/path/to/image1 | Car | 0 | 0.540923 | 0.583333 | 0.574405 | 0.626984 |
/path/to/image1 | Car | 0 | 0.389881 | 0.574074 | 0.470982 | 0.683862 |
/path/to/image2 | Car | 0 | 0.447173 | 0.555556 | 0.497024 | 0.638889 |
/path/to/image2 | Street Sign | 1 | 0.946429 | 0.40873 | 0.991815 | 0.510582 |
Because images=['/path/to/image1', '/path/to/image2', ...]
were passed,
their actual labels must be provided. Same thing applies to the images contained in a directory
if image_dir
was passed instead.
Contributions are what make the open source community such an amazing place to
learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
There are relevant cases in which the issues will be addressed and irrelevant ones that will be closed.
The following issues will be addressed.
- Bugs.
- Performance issues.
- Installation issues.
- Documentation issues.
- Feature requests.
- Dependency issues that can be solved.
The following issues will not be addressed and will be closed.
- Issues without context / clear and concise explanation.
- Issues without standalone code (minimum reproducible example), or a jupyter notebook link to reproduce errors.
- Issues that are improperly formatted.
- Issues that are dataset / label specific without a dataset sample link.
- Issues that are the result of doing something that is unsupported by the existing features.
- Issues that are not considered as improvement / useful.
Distributed under the MIT License. See LICENSE for more information.
Give a ⭐️ if this project helped you!