This repository is dedicated to research on a light-weight image generation AI. The primary goal is to develop a model that can be rapidly trained on a specific dataset for customized purposes, moving away from the heavy architectures that are common in the field.
We combined the model and generate all learnt image based on textual embedding only, which then processed in our specialised mechanism, so that the generated image can be easily viewed and analyse based on their similarity under PCA or other clustering method such as T-SNE
The result can be viewed easily after installing all requirement using
tensorboard --logdir ./resources/output/combined_projection
We have taken inspiration from the stable diffusion architecture of Stability AI. However, we've introduced several customizations:
- Decoder (Up-scaler) only inspired super-resolution model, which results in a lighter architecture.
- A modified U-Net architecture that serves the diffusion purpose, adapted to generate smaller resolution image in order to save computational resource and budget.
It's worth to mention that semantic mapping is not the focus of this research. Instead, we've implemented a simplified version of class embedding using torch.nn.Embedding
from PyTorch for demonstration purposes.
The model has been developed and tested on a Linux server running Python 3.8.10. Please ensure that you have a compatible version of Python installed on your system. If you need to install or update Python, you can download it from the official Python website.
Once you have the correct Python version installed, you can proceed to install the necessary libraries by running the following command in your terminal or command prompt:
pip install -r src/requirements.txt
The project is structured into several modules, each serving a distinct purpose in the overall architecture:
- The initial dataset includes a directory of images and an annotation CSV.
- Preprocessing of this dataset has been completed, and the results can be viewed in
preprocessing.ipynb
.
- After the preprocessing steps, a customized PyTorch dataset class has been constructed.
- The implementation of this class can be found in the
entities
directory.
- A simple class embedding has been implemented using PyTorch's
Embedding
module.
- This module encompasses the network, trainer class, training customizations, and trigger code.
- The relevant code and details can be found in
vae_training_app.py
. - It is important to note that we no longer proceed with this model due to low generative performance
- This module includes the components, different variations of U-Net, and the diffuser (trainer).
- In this repository, we implemented a few run on diffusion models, which is
- Hyper-parameter search with WandB server
diffusion_hyper_tuning.py
- Final train for linear and cosine variance scheduler
diffusion_linear.py
,diffusion_cosine.py
- Hyper-parameter search with WandB server
- In this implementation, trained a model to increase the resolution of images from squared 64px to 256px
- The training trigger can be found at
up_scaler_training.py
We would like to express our gratitude to AITIS for providing the original dataset used in this research. Their contribution has been invaluable in advancing our understanding and development of light-weight image generation AI.