Skip to content

This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approach, users can upload images and specify colors for different objects, enhancing the colorization process through a user-friendly Gradio interface.

Notifications You must be signed in to change notification settings

nick8592/text-guided-image-colorization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-Guided-Image-Colorization

This project utilizes the power of Stable Diffusion (SDXL/SDXL-Light) and the BLIP (Bootstrapping Language-Image Pre-training) captioning model to provide an interactive image colorization experience. Users can influence the generated colors of objects within images, making the colorization process more personalized and creative.

framework.jpg

Table of Contents

News

  • (2024/11/23) The project is now available on Hugging Face Spaces 🎉 Big thanks to @fffiloni!

Features

  • Interactive Colorization: Users can specify desired colors for different objects in the image.
  • ControlNet Approach: Enhanced colorization capabilities through retraining with ControlNet, allowing SDXL to better adapt to the image colorization task.
  • High-Quality Outputs: Leverage the latest advancements in diffusion models to generate vibrant and realistic colorizations.

Installation

To set up the project locally, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/nick8592/text-guided-image-colorization.git
    cd text-guided-image-colorization
  2. Install Dependencies: Make sure you have Python 3.7 or higher installed. Then, install the required packages:

    pip install -r requirements.txt

    Install torch and torchvision matching your CUDA version:

    pip install torch torchvision --index-url https://download.pytorch.org/whl/cuXXX

    Replace XXX with your CUDA version (e.g., 118 for CUDA 11.8). For more info, see PyTorch Get Started.

  3. Download Pre-trained Models:

    Models Hugging Face
    SDXL-Lightning Caption link
    SDXL-Lightning Custom Caption (Recommand) link
    text-guided-image-colorization/sdxl_light_caption_output
    └── checkpoint-30000
        ├── controlnet
        │   ├── diffusion_pytorch_model.safetensors
        │   └── config.json
        ├── optimizer.bin
        ├── random_states_0.pkl
        ├── scaler.pt
        └── scheduler.bin

Quick Start

  1. Run the gradio_ui.py script:
python gradio_ui.py
  1. Open the provided URL in your web browser to access the Gradio-based user interface.

  2. Upload an image and use the interface to control the colors of specific objects in the image. But still the model can generate images without a specific prompt.

  3. The model will generate a colorized version of the image based on your input (or automatic). See the demo video. Gradio UI

Dataset Usage

You can find more details about the dataset usage in the Dataset-for-Image-Colorization.

Training

For training, you can use one of the following scripts:

Although the training code for SDXL is provided, due to a lack of GPU resources, I wasn't able to train the model by myself. Therefore, there might be some errors when you try to train the model.

Evaluation

For evaluation, you can use one of the following scripts:

  • eval_controlnet.sh: Evaluates the model using Stable Diffusion v2 for a folder of images.
  • eval_controlnet_sdxl_light.sh: Evaluates the model using SDXL-Lightning for a folder of images.
  • eval_controlnet_sdxl_light_single.sh: Evaluates the model using SDXL-Lightning for a single image.

Results

Prompt-Guided

Caption Condition 1 Condition 2 Condition 3
000000022935_gray.jpg 000000022935_green_shirt_on_right_girl.jpeg 000000022935_purple_shirt_on_right_girl.jpeg 000000022935_red_shirt_on_right_girl.jpeg
a photography of a woman in a soccer uniform kicking a soccer ball + "green shirt" + "purple shirt" + "red shirt"
000000041633_gray.jpg 000000041633_bright_red_car.jpeg 000000041633_dark_blue_car.jpeg 000000041633_black_car.jpeg
a photography of a photo of a truck + "bright red car" + "dark blue car" + "black car"
000000286708_gray.jpg 000000286708_orange_hat.jpeg 000000286708_pink_hat.jpeg 000000286708_yellow_hat.jpeg
a photography of a cat wearing a hat on his head + "orange hat" + "pink hat" + "yellow hat"

Prompt-Free

Ground truth images are provided solely for reference purpose in the image colorization task.

Grayscale Image Colorized Result Ground Truth
000000025560_gray.jpg 000000025560_color.jpg 000000025560_gt.jpg
000000065736_gray.jpg 000000065736_color.jpg 000000065736_gt.jpg
000000091779_gray.jpg 000000091779_color.jpg 000000091779_gt.jpg
000000092177_gray.jpg 000000092177_color.jpg 000000092177_gt.jpg
000000166426_gray.jpg 000000166426_color.jpg 000000025560_gt.jpg

Read More

Here are some related articles you might find interesting:

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approach, users can upload images and specify colors for different objects, enhancing the colorization process through a user-friendly Gradio interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published