Author: @FerryHuang
This is an implementation of the papers:
Null-text Inversion for Editing Real Images using Guided Diffusion Models
Task: Text2Image, diffusion, inversion, editing
Diffusion's inversion basically means you put an image (with or without a prompt) into a method and it will return a latent code which can be later turned back to a image with high simmilarity as the original one. Of course we want this latent code for an editing purpose, that's also why we always implement inversion methods together with the editing methods.
This project contains Two inversion methods and One editing method.
A walkthrough of the project is provided here
or you can just run the following scripts to get the results:
# load the mmagic SD1.5
from mmengine import MODELS, Config
from mmengine.registry import init_default_scope
init_default_scope('mmagic')
config = 'configs/stable_diffusion/stable-diffusion_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()
StableDiffuser = MODELS.build(config.model)
StableDiffuser = StableDiffuser.to('cuda')
# inversion
image_path = 'projects/prompt_to_prompt/assets/gnochi_mirror.jpeg'
prompt = "a cat sitting next to a mirror"
image_tensor = ptp_utils.load_512(image_path).to('cuda')
from inversions.null_text_inversion import NullTextInversion
from models.ptp import EmptyControl
from models import ptp_utils
null_inverter = NullTextInversion(StableDiffuser)
null_inverter.init_prompt(prompt)
ddim_latents = null_inverter.ddim_inversion(image_tensor)
x_t = ddim_latents[-1]
uncond_embeddings = null_inverter.null_optimization(ddim_latents, num_inner_steps=10, epsilon=1e-5)
null_text_rec, _ = ptp_utils.text2image_ldm_stable(StableDiffuser, [prompt], EmptyControl(), latent=x_t, uncond_embeddings=uncond_embeddings)
ptp_utils.view_images(null_text_rec)
# prompt-to-prompt editing
prompts = ["A cartoon of spiderman",
"A cartoon of ironman"]
import torch
from models.ptp import LocalBlend, AttentionReplace
from models.ptp_utils import text2image_ldm_stable
g = torch.Generator().manual_seed(2023616)
lb = LocalBlend(prompts, ("spiderman", "ironman"), model=StableDiffuser)
controller = AttentionReplace(prompts, 50,
cross_replace_steps={"default_": 1., "ironman": .2},
self_replace_steps=0.4,
local_blend=lb, model=StableDiffuser)
images, x_t = text2image_ldm_stable(StableDiffuser, prompts, controller, latent=None,
num_inference_steps=50, guidance_scale=7.5, uncond_embeddings=None, generator=g)
@article{hertz2022prompt,
title = {Prompt-to-Prompt Image Editing with Cross Attention Control},
author = {Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
journal = {arXiv preprint arXiv:2208.01626},
year = {2022},
}
@article{mokady2022null,
title={Null-text Inversion for Editing Real Images using Guided Diffusion Models},
author={Mokady, Ron and Hertz, Amir and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
journal={arXiv preprint arXiv:2211.09794},
year={2022}
}