Skip to content

Orchestrating AI for stunning lip-synced videos. Effortless workflow, exceptional results, all in one place.

License

Notifications You must be signed in to change notification settings

pawansharmaaaa/Lip_Wise

Repository files navigation

Open in Google Colab

Important

Please Help by starring the repo. 😁


Lip-Wise leverages Wav2Lip for audio-to-lip generation, seamlessly integrating with cutting-edge face restoration models (CodeFormer, GFPGAN, RestoreFormer) for added realism. MediaPipe ensures precise facial landmark detection, while RealESRGAN enhances background quality. Simply provide an audio clip and a reference video, and Lip-Wise orchestrates the process to deliver stunning results.

Here's what makes Lip-Wise stand out:

  • Effortless Workflow: Unleash your creativity with an intuitive and user-friendly interface.
  • Unleash Your Vision: No more limitations - use any video, even those without a face in every frame.
  • Precision Meets Efficiency: Combining enhanced face detection, landmark recognition, and streamlined processing delivers superior results with significantly faster performance.
  • Simplified Setup: Get started quickly with minimal technical hassle - a breeze even for beginners.

🖼️ UI Screenshots:

👓 INSTALLATION

⚡ QUICK INFERENCE

Open in Google Colab

💡Tip: Make sure to use GPU runtime for faster processing.


💿 SETUP AND INFERENCE

WindowsnVIDIA

  • Clone this repository:
    • git clone https://github.com/pawansharmaaaa/Lip_Wise
  • Install Python > 3.10 from Official Site or From Microsoft store.
  • Install winget from Microsoft Store.
  • Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.
  • Run setup.bat
  • Run launch.bat

DebianUbuntuPop!_OSnVIDIA

  • Clone this repository:
    • git clone https://github.com/pawansharmaaaa/Lip_Wise
  • Make sure python --version is >3.10
  • Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.
  • Make setup.sh an executable
    • chmod +x ./setup.sh
  • Run setup.sh by double clicking on it.
  • Make launch.sh an executable
    • chmod +x ./launch.sh
  • Run launch.sh by double clicking on it.

🎛️ FEATURES

LipWise empowers you to create stunningly realistic and natural results, combining the power of AI with user-friendly features:

Media Versatility:

  • Process both images and videos: Breathe life into your visuals, regardless of format.
  • Advanced image and video preprocessing: Ensure optimal quality for exceptional results.

Cutting-edge Restoration:

  • Harness the power of leading models: GFPGAN, RestoreFormer, and CodeFormer work in tandem to deliver exceptional detail and clarity.
  • RealESRGAN integration: Enhance the background quality of your visuals effortlessly.

Image Processing:

  • 3D alignment in process image: Achieve unparalleled realism with precise facial landmark detection.

Video Processing:

  • No need for face in every frame: LipWise intelligently interpolates missing frames, ensuring smooth transitions and realistic lip movements.
  • Fast inference: Enjoy a fluid experience with rapid video processing.
  • Video looping: Create seamless looping videos with consistent results.
  • RealESRGAN integration: Elevate the background quality of your videos effortlessly

📜 LICENSE AND ACKNOWLEDGEMENT

Lip-Wise is released under Apache License Version 2.0.

Citations

    @inproceedings{10.1145/3394171.3413532,
        author = {Prajwal, K R and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},
        title = {A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild},
        year = {2020},
        isbn = {9781450379885},
        publisher = {Association for Computing Machinery},
        address = {New York, NY, USA},
        url = {https://doi.org/10.1145/3394171.3413532},
        doi = {10.1145/3394171.3413532},
        booktitle = {Proceedings of the 28th ACM International Conference on Multimedia},
        pages = {484–492},
        numpages = {9},
        keywords = {lip sync, talking face generation, video generation},
        location = {Seattle, WA, USA},
        series = {MM '20}
    }
    @inproceedings{zhou2022codeformer,
        author = {Zhou, Shangchen and Chan, Kelvin C.K. and Li, Chongyi and Loy, Chen Change},
        title = {Towards Robust Blind Face Restoration with Codebook Lookup TransFormer},
        booktitle = {NeurIPS},
        year = {2022}
    }
    @InProceedings{wang2021gfpgan,
        author = {Xintao Wang and Yu Li and Honglun Zhang and Ying Shan},
        title = {Towards Real-World Blind Face Restoration with Generative Facial Prior},
        booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
        year = {2021}
    }
    @InProceedings{wang2021realesrgan,
        author    = {Xintao Wang and Liangbin Xie and Chao Dong and Ying Shan},
        title     = {Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data},
        booktitle = {International Conference on Computer Vision Workshops (ICCVW)},
        date      = {2021}
    }

📧 Contact

Reach out to me @ [email protected]