Skip to content

Conversation

@tom-jerr
Copy link
Contributor

Motivation

Supporting Wan Animate (https://huggingface.co/Wan-AI/Wan2.2-Animate-14B-Diffusers)

See issue #13867

Based on PR #15419

Modifications

This PR introduces a customized orchestration pipeline for the Wan Animate model. By implementing a Segment Loop mechanism, it overcomes VRAM limitations for long video generation. Additionally, it integrates a Data Preprocessing stage that supports direct extraction of pose and face features from raw videos, significantly lowering the barrier to entry for end-users.

Pipeline Design

WanAnimatePipeline:

Preprocessing -> Validation -> Text/Image Encoding -> VAE Encoding -> Video Processing -> Condition Construction -> Segment Loop

SegmentLoopStage

  • The SegmentLoopStage manages the cur_segment index and orchestrates a sub-pipeline consisting of five stages: Conditioning → Timestep → Latent → Denoising → Decoding
  • The WanAnimateConditioningStage dynamically slices the Pose/Face conditions for the current segment.
  • The DecodingStage triggers postprocess_decoded_frames for incremental frame stitching and returns a Req object to drive the next iteration or returns OutputBatch to terminate.
image

Data Preprocessing Stage

We provide a Data Preprocessing Stage, if users provide the onnx model path of the preprocess model, we can directly process the original video passed in by the user instead of the user passing in the preprocessed pose video and face video.

With preprocess_model_path:

sglang generate --model-path Wan-AI/Wan2.2-Animate-14B-Diffusers \
 --prompt='People in the video are doing actions.' --image-path [ref image path] --video-path [video path] --preprocess-model-path [process model path] --width  1280 --height 720 --save-output 

Without preprocess_model_path:

sglang generate --model-path Wan-AI/Wan2.2-Animate-14B-Diffusers\\n --prompt='People in the video are doing actions.'  --image-path [ref image path] --pose-video-path [processed pose video path] --face-video-path [processed face video path] --width  1280 --height 720 --save-output 

TODO

  • replace mode for wan animate
  • retarget and use flux for data preprocess

Accuracy Tests

Replicate.com wan animate animation:

replicate-prediction-h3jf38znwnrma0csgty9ejwmbg.mp4

Our results:

People_in_the_video_are_doing_actions._20251229-141238_91e44799.mp4

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added npu diffusion SGLang Diffusion labels Dec 30, 2025
@tom-jerr
Copy link
Contributor Author

I have a problem, whether we should do data preprocess in inference pipeline?


# 2. send requests to scheduler, one at a time
# TODO: send batch when supported
# import debugpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion npu

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants