Skip to content

Add option to pass RGB-formatted images directly#14092

Closed
ohinds wants to merge 25 commits intoultralytics:mainfrom
ohinds:optional_bgr2rgb_conversion
Closed

Add option to pass RGB-formatted images directly#14092
ohinds wants to merge 25 commits intoultralytics:mainfrom
ohinds:optional_bgr2rgb_conversion

Conversation

@ohinds
Copy link

@ohinds ohinds commented Jun 28, 2024

This eliminates the need for the user to convert RGB-formatted images to BGR format before inference when working with numpy array images that are already in RGB format.

See
#2575 and #9912

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Added support for specifying whether input images are already in RGB format, improving flexibility and accuracy in image processing workflows. 🎨🖼️

📊 Key Changes

  • Introduced a new rgb_input option in configuration files and model arguments.
  • Updated image preprocessing logic to skip BGR-to-RGB conversion if rgb_input=True.
  • Added a test to verify correct behavior when handling different image channel orders.

🎯 Purpose & Impact

  • Prevents color misinterpretation when users provide images already in RGB format.
  • Reduces unnecessary processing steps, potentially improving performance.
  • Enhances user control and reliability for custom pipelines and advanced use cases.

This eliminates the need for the user to convert RGB-formatted images to
BGR before inference when working with images that are already in RGB format.

See
ultralytics#2575 (comment)
and ultralytics#9912
@ohinds
Copy link
Author

ohinds commented Jun 28, 2024

I have read the CLA Document and I sign the CLA

@codecov
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 1, 2024

@ohinds current setup is in place to allow direct image input from both cv2 (BGR native) and PIL (RGB native) without any additional preprocessing:

import cv2
from PIL import Image

from ultralytics import YOLO

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
im = cv2.imread('path/to/your/image.jpg')
cv2_results = model(im)

# Example with PIL image (RGB)
im = Image.open('path/to/your/image.jpg')
pil_results = model(im)

See https://docs.ultralytics.com/modes/predict/#key-features-of-predict-mode

Screenshot 2024-07-01 at 11 41 01

@ohinds
Copy link
Author

ohinds commented Jul 2, 2024

@ohinds current setup is in place to allow direct image input from both cv2 (BGR native) and PIL (RGB native) without any additional preprocessing:

import cv2
from PIL import Image

from ultralytics import YOLO

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
im = cv2.imread('path/to/your/image.jpg')
cv2_results = model(im)

# Example with PIL image (RGB)
im = Image.open('path/to/your/image.jpg')
pil_results = model(im)

See https://docs.ultralytics.com/modes/predict/#key-features-of-predict-mode

Screenshot 2024-07-01 at 11 41 01

@glenn-jocher thanks for the reply.

I'm familiar with the internal preprocessing of bare numpy arrays (cv2 BGR) and the numpy arrays backing PIL images (RGB) in the LoadPilAndNumpy class. This PR is to allow more flexibility and to reduce unnecessary color channel encoding conversion when working with numpy arrays as input, which might not be in BGR format as the detector preprocessing method assumes. For example, when processing a PIL image that's encoded in RGB format, the image is read into memory, then ultralytics.data.loaders.LoadPilAndNumpy:_single_check converts it to BGR, then ultralytics.engine.BasePredictor:preprocess converts it back to RGB before inference. There is even another BGR to RGB conversion that occurs if the PIL image is read in BGR mode.

My goal is to eliminate required color channel conversion when running inference on numpy array images that are already encoded as RGB (e.g., for systems that read RGB-encoded images from gstreamer). I think this change both offers flexibility for users who have an advanced understanding of the image encoding, increased performance by eliminating the need for redundant channel swaps, and more clarity by making the default assumption of BGR encoding explicit through the default value of rgb_input=False.

Code snippet demonstrating my issue:

import cv2
import numpy as np
from PIL import Image
import torch
from ultralytics import YOLO
from ultralytics.utils import ASSETS

source = str(ASSETS / "bus.jpg")

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
cv2_im = cv2.imread(source)
cv2_results = model(cv2_im)

# Example with PIL image (RGB)
pil_im = Image.open(source)
pil_results = model(pil_im)

assert torch.equal(cv2_results[0].boxes.data, pil_results[0].boxes.data)

# Example with RGB np array, simulating obtaining frame data from an rtsp stream, etc.
rgb_im = np.array(pil_im)
rgb_results = model(rgb_im)

# Fails
assert torch.equal(cv2_results[0].boxes.data, rgb_results[0].boxes.data)

@glenn-jocher
Copy link
Member

Thank you for the detailed explanation and the code snippet demonstrating the issue. Your goal to eliminate unnecessary color channel conversions when working with numpy arrays in RGB format is well understood and appreciated.

To address your concern, we can introduce an option to handle RGB-formatted images directly, thus avoiding redundant conversions and improving performance. This change will provide flexibility for advanced users while maintaining the default BGR assumption for backward compatibility.

Here's a proposed solution to incorporate an rgb_input parameter:

  1. Add rgb_input Parameter: Introduce a new parameter rgb_input in the model's preprocessing steps.
  2. Modify Preprocessing Logic: Adjust the preprocessing logic to account for both RGB and BGR image inputs based on the rgb_input parameter.

Below is an example of how this can be implemented:

import cv2
import numpy as np
from PIL import Image
import torch
from ultralytics import YOLO
from ultralytics.utils import ASSETS

source = str(ASSETS / "bus.jpg")

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
cv2_im = cv2.imread(source)
cv2_results = model(cv2_im, rgb_input=False)  # Default BGR

# Example with PIL image (RGB)
pil_im = Image.open(source)
pil_results = model(pil_im, rgb_input=True)  # PIL images are RGB

assert torch.equal(cv2_results[0].boxes.data, pil_results[0].boxes.data)

# Example with RGB np array, simulating obtaining frame data from an rtsp stream, etc.
rgb_im = np.array(pil_im)
rgb_results = model(rgb_im, rgb_input=True)  # Explicitly specify RGB input

# Should pass now
assert torch.equal(cv2_results[0].boxes.data, rgb_results[0].boxes.data)

By adding the rgb_input parameter, users can specify the color format of their input images, thus avoiding unnecessary conversions and ensuring accurate predictions.

Please verify if this approach aligns with your requirements. If you encounter any issues or have further suggestions, feel free to share them. Your contributions are invaluable to enhancing the flexibility and performance of the Ultralytics YOLO models. 🚀

Thank you for your engagement and support!

@ohinds
Copy link
Author

ohinds commented Jul 3, 2024

Thank you for the detailed explanation and the code snippet demonstrating the issue. Your goal to eliminate unnecessary color channel conversions when working with numpy arrays in RGB format is well understood and appreciated.

To address your concern, we can introduce an option to handle RGB-formatted images directly, thus avoiding redundant conversions and improving performance. This change will provide flexibility for advanced users while maintaining the default BGR assumption for backward compatibility.

Here's a proposed solution to incorporate an rgb_input parameter:

  1. Add rgb_input Parameter: Introduce a new parameter rgb_input in the model's preprocessing steps.
  2. Modify Preprocessing Logic: Adjust the preprocessing logic to account for both RGB and BGR image inputs based on the rgb_input parameter.

Below is an example of how this can be implemented:

import cv2
import numpy as np
from PIL import Image
import torch
from ultralytics import YOLO
from ultralytics.utils import ASSETS

source = str(ASSETS / "bus.jpg")

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
cv2_im = cv2.imread(source)
cv2_results = model(cv2_im, rgb_input=False)  # Default BGR

# Example with PIL image (RGB)
pil_im = Image.open(source)
pil_results = model(pil_im, rgb_input=True)  # PIL images are RGB

assert torch.equal(cv2_results[0].boxes.data, pil_results[0].boxes.data)

# Example with RGB np array, simulating obtaining frame data from an rtsp stream, etc.
rgb_im = np.array(pil_im)
rgb_results = model(rgb_im, rgb_input=True)  # Explicitly specify RGB input

# Should pass now
assert torch.equal(cv2_results[0].boxes.data, rgb_results[0].boxes.data)

By adding the rgb_input parameter, users can specify the color format of their input images, thus avoiding unnecessary conversions and ensuring accurate predictions.

Please verify if this approach aligns with your requirements. If you encounter any issues or have further suggestions, feel free to share them. Your contributions are invaluable to enhancing the flexibility and performance of the Ultralytics YOLO models. 🚀

Thank you for your engagement and support!

Hi @glenn-jocher, perfect. Just one thing to note about the code snippet you provided. I did not modify ultralytics.data.loaders.LoadPilAndNumpy:_single_check as part of this PR, so rgb_input=True should not be passed when the inference source is a PIL Image instance. So the line

 pil_results = model(pil_im, rgb_input=True)  # PIL images are RGB

should be

pil_results = model(pil_im)  # PIL images are RGB

@glenn-jocher
Copy link
Member

Hi @ohinds,

Thank you for the clarification! Your attention to detail is much appreciated. You're absolutely right; the rgb_input=True parameter should not be passed when the inference source is a PIL Image instance, as the internal handling already accounts for the RGB format.

Here's the corrected code snippet:

import cv2
import numpy as np
from PIL import Image
import torch
from ultralytics import YOLO
from ultralytics.utils import ASSETS

source = str(ASSETS / "bus.jpg")

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
cv2_im = cv2.imread(source)
cv2_results = model(cv2_im, rgb_input=False)  # Default BGR

# Example with PIL image (RGB)
pil_im = Image.open(source)
pil_results = model(pil_im)  # PIL images are RGB

assert torch.equal(cv2_results[0].boxes.data, pil_results[0].boxes.data)

# Example with RGB np array, simulating obtaining frame data from an rtsp stream, etc.
rgb_im = np.array(pil_im)
rgb_results = model(rgb_im, rgb_input=True)  # Explicitly specify RGB input

# Should pass now
assert torch.equal(cv2_results[0].boxes.data, rgb_results[0].boxes.data)

This adjustment ensures that the PIL image handling remains as expected while providing the flexibility to handle RGB numpy arrays directly.

Please let us know if this aligns with your requirements or if there are any further adjustments needed. Your contributions are invaluable in enhancing the usability and performance of the Ultralytics YOLO models. 🚀

Thank you for your engagement and support!

@ohinds
Copy link
Author

ohinds commented Jul 3, 2024

Hi @ohinds,

Thank you for the clarification! Your attention to detail is much appreciated. You're absolutely right; the rgb_input=True parameter should not be passed when the inference source is a PIL Image instance, as the internal handling already accounts for the RGB format.

Here's the corrected code snippet:

import cv2
import numpy as np
from PIL import Image
import torch
from ultralytics import YOLO
from ultralytics.utils import ASSETS

source = str(ASSETS / "bus.jpg")

# Load the YOLO model
model = YOLO('yolov8n.pt')  # or use your custom trained model

# Example with cv2 image (BGR)
cv2_im = cv2.imread(source)
cv2_results = model(cv2_im, rgb_input=False)  # Default BGR

# Example with PIL image (RGB)
pil_im = Image.open(source)
pil_results = model(pil_im)  # PIL images are RGB

assert torch.equal(cv2_results[0].boxes.data, pil_results[0].boxes.data)

# Example with RGB np array, simulating obtaining frame data from an rtsp stream, etc.
rgb_im = np.array(pil_im)
rgb_results = model(rgb_im, rgb_input=True)  # Explicitly specify RGB input

# Should pass now
assert torch.equal(cv2_results[0].boxes.data, rgb_results[0].boxes.data)

This adjustment ensures that the PIL image handling remains as expected while providing the flexibility to handle RGB numpy arrays directly.

Please let us know if this aligns with your requirements or if there are any further adjustments needed. Your contributions are invaluable in enhancing the usability and performance of the Ultralytics YOLO models. 🚀

Thank you for your engagement and support!

Hi @glenn-jocher, yes this example looks good now. Thank you.

@glenn-jocher
Copy link
Member

Hi @ohinds,

Thank you for confirming that the updated example aligns with your requirements. We're glad to hear that it addresses your needs effectively. Your feedback and contributions are highly valued and play a crucial role in enhancing the flexibility and performance of the Ultralytics YOLO models. 🚀

If you encounter any further issues or have additional suggestions, please don't hesitate to share them. We're here to help and continuously improve based on user insights.

Thank you for your engagement and support! 😊

@github-actions
Copy link

👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label May 21, 2025
@github-actions github-actions bot closed this Jun 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale Stale and schedule for closing soon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants