Inversions & Editing (DDIM Inversion & Null-Text Inversion & Prompt-to-Prompt Editing)

Author: @FerryHuang

This is an implementation of the papers:

PROMPT-TO-PROMPT IMAGE EDITING WITH CROSS-ATTENTION CONTROL

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Task: Text2Image, diffusion, inversion, editing

Abstract

Diffusion's inversion basically means you put an image (with or without a prompt) into a method and it will return a latent code which can be later turned back to a image with high simmilarity as the original one. Of course we want this latent code for an editing purpose, that's also why we always implement inversion methods together with the editing methods.

This project contains Two inversion methods and One editing method.

From right to left: origin image, DDIM inversion, Null-text inversion

Prompt-to-prompt Editing

cat -> dog

spider man -> iron man(attention replace)

Effel tower -> Effel tower at night (attention refine)

blossom sakura tree -> blossom(-3) sakura tree (attention reweight)

Quick Start

A walkthrough of the project is provided here

or you can just run the following scripts to get the results:

# load the mmagic SD1.5
from mmengine import MODELS, Config
from mmengine.registry import init_default_scope

init_default_scope('mmagic')

config = 'configs/stable_diffusion/stable-diffusion_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()

StableDiffuser = MODELS.build(config.model)
StableDiffuser = StableDiffuser.to('cuda')

# inversion
image_path = 'projects/prompt_to_prompt/assets/gnochi_mirror.jpeg'
prompt = "a cat sitting next to a mirror"
image_tensor = ptp_utils.load_512(image_path).to('cuda')

from inversions.null_text_inversion import NullTextInversion
from models.ptp import EmptyControl
from models import ptp_utils

null_inverter = NullTextInversion(StableDiffuser)
null_inverter.init_prompt(prompt)
ddim_latents = null_inverter.ddim_inversion(image_tensor)
x_t = ddim_latents[-1]
uncond_embeddings = null_inverter.null_optimization(ddim_latents, num_inner_steps=10, epsilon=1e-5)
null_text_rec, _ = ptp_utils.text2image_ldm_stable(StableDiffuser, [prompt], EmptyControl(), latent=x_t, uncond_embeddings=uncond_embeddings)
ptp_utils.view_images(null_text_rec)

# prompt-to-prompt editing
prompts = ["A cartoon of spiderman",
           "A cartoon of ironman"]
import torch
from models.ptp import LocalBlend, AttentionReplace
from models.ptp_utils import text2image_ldm_stable
g = torch.Generator().manual_seed(2023616)
lb = LocalBlend(prompts, ("spiderman", "ironman"), model=StableDiffuser)
controller = AttentionReplace(prompts, 50,
                              cross_replace_steps={"default_": 1., "ironman": .2},
                              self_replace_steps=0.4,
                              local_blend=lb, model=StableDiffuser)
images, x_t = text2image_ldm_stable(StableDiffuser, prompts, controller, latent=None,
                                    num_inference_steps=50, guidance_scale=7.5, uncond_embeddings=None, generator=g)

Citation

@article{hertz2022prompt,
  title = {Prompt-to-Prompt Image Editing with Cross Attention Control},
  author = {Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
  journal = {arXiv preprint arXiv:2208.01626},
  year = {2022},
}
@article{mokady2022null,
  title={Null-text Inversion for Editing Real Images using Guided Diffusion Models},
  author={Mokady, Ron and Hertz, Amir and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2211.09794},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Inversions & Editing (DDIM Inversion & Null-Text Inversion & Prompt-to-Prompt Editing)

Abstract

From right to left: origin image, DDIM inversion, Null-text inversion

Prompt-to-prompt Editing

Quick Start

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Inversions & Editing (DDIM Inversion & Null-Text Inversion & Prompt-to-Prompt Editing)

Abstract

From right to left: origin image, DDIM inversion, Null-text inversion

Prompt-to-prompt Editing

Quick Start

Citation