ComfyUI workflow adds lip-sync dubs in six languages
This document describes a workflow for creating lip-synced video dubs and rephrases using a ComfyUI-based system, leveraging LTX-2.3-22b-IC-LoRA-LipDub models from Hugging Face. The workflow allows for translating speech or rephrasing dialogue in a video while generating new lip movements and audio to match the provided text, supporting multiple languages and emphasizing the importance of matching original dialogue length for natural-sounding output.
Key Takeaways
- The workflow uses LTX-2.3-22b-IC-LoRA-LipDub from Hugging Face inside ComfyUI.
- It supports both dubbing, which translates speech, and rephrasing, which changes dialogue without changing language.
- The model generates new lip movements and audio to match the provided text.
- The notes say the current LoRA supports only one speaker.
- The document recommends using native script for the target language and keeping the translated line close to the original length to avoid skipped words or unnatural pacing.
Why It Matters
This is a practical pipeline for localized video edits: the workflow does not just swap audio, it also regenerates lip motion to fit the new line. That matters for dubbing and rephrasing use cases where visible mouth movement is part of the quality bar. The document also makes the operating constraints clear: one speaker only, native script, and similar line length. For teams evaluating AI video tooling, those limits are as important as the generation itself. Watch for how the same workflow behaves across the listed language prompts and whether longer or shorter translated lines degrade output.
Read full article at drive.google.com