Skip to content

Latest commit

 

History

History
129 lines (99 loc) · 4.81 KB

README.md

File metadata and controls

129 lines (99 loc) · 4.81 KB

Inversions & Editing (DDIM Inversion & Null-Text Inversion & Prompt-to-Prompt Editing)

Author: @FerryHuang

This is an implementation of the papers:

PROMPT-TO-PROMPT IMAGE EDITING WITH CROSS-ATTENTION CONTROL

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Task: Text2Image, diffusion, inversion, editing

Abstract

Diffusion's inversion basically means you put an image (with or without a prompt) into a method and it will return a latent code which can be later turned back to a image with high simmilarity as the original one. Of course we want this latent code for an editing purpose, that's also why we always implement inversion methods together with the editing methods.

This project contains Two inversion methods and One editing method.

From right to left: origin image, DDIM inversion, Null-text inversion

Prompt-to-prompt Editing

cat -> dog
spider man -> iron man(attention replace)
Effel tower -> Effel tower at night (attention refine)
blossom sakura tree -> blossom(-3) sakura tree (attention reweight)

Quick Start

A walkthrough of the project is provided here

or you can just run the following scripts to get the results:

# load the mmagic SD1.5
from mmengine import MODELS, Config
from mmengine.registry import init_default_scope

init_default_scope('mmagic')

config = 'configs/stable_diffusion/stable-diffusion_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()

StableDiffuser = MODELS.build(config.model)
StableDiffuser = StableDiffuser.to('cuda')
# inversion
image_path = 'projects/prompt_to_prompt/assets/gnochi_mirror.jpeg'
prompt = "a cat sitting next to a mirror"
image_tensor = ptp_utils.load_512(image_path).to('cuda')

from inversions.null_text_inversion import NullTextInversion
from models.ptp import EmptyControl
from models import ptp_utils

null_inverter = NullTextInversion(StableDiffuser)
null_inverter.init_prompt(prompt)
ddim_latents = null_inverter.ddim_inversion(image_tensor)
x_t = ddim_latents[-1]
uncond_embeddings = null_inverter.null_optimization(ddim_latents, num_inner_steps=10, epsilon=1e-5)
null_text_rec, _ = ptp_utils.text2image_ldm_stable(StableDiffuser, [prompt], EmptyControl(), latent=x_t, uncond_embeddings=uncond_embeddings)
ptp_utils.view_images(null_text_rec)
# prompt-to-prompt editing
prompts = ["A cartoon of spiderman",
           "A cartoon of ironman"]
import torch
from models.ptp import LocalBlend, AttentionReplace
from models.ptp_utils import text2image_ldm_stable
g = torch.Generator().manual_seed(2023616)
lb = LocalBlend(prompts, ("spiderman", "ironman"), model=StableDiffuser)
controller = AttentionReplace(prompts, 50,
                              cross_replace_steps={"default_": 1., "ironman": .2},
                              self_replace_steps=0.4,
                              local_blend=lb, model=StableDiffuser)
images, x_t = text2image_ldm_stable(StableDiffuser, prompts, controller, latent=None,
                                    num_inference_steps=50, guidance_scale=7.5, uncond_embeddings=None, generator=g)

Citation

@article{hertz2022prompt,
  title = {Prompt-to-Prompt Image Editing with Cross Attention Control},
  author = {Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
  journal = {arXiv preprint arXiv:2208.01626},
  year = {2022},
}
@article{mokady2022null,
  title={Null-text Inversion for Editing Real Images using Guided Diffusion Models},
  author={Mokady, Ron and Hertz, Amir and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2211.09794},
  year={2022}
}