
Dubbing and localization
Swap the audio to another language and re-sync the mouth — localize a clip without re-shooting.
Pixverse's lip-sync model — re-sync a video clip to new audio.
Betaal één keer voor credits - gebruik ze voor elk model op ZOOOP. · Vul bij wanneer dat nodig is, geen maandelijkse verbranding.
Powered by Pixverse AI's API on ZOOOP
Provide a video clip and a new audio track, and Pixverse Lipsync re-aligns the subject's mouth to the new audio — dubbing, re-voicing, and language swaps.
A video and an audio track — no extra parameters to manage.
A low-cost way to re-voice and localize clips.
Built for existing video — talking-head clips and recorded performances.

Swap the audio to another language and re-sync the mouth — localize a clip without re-shooting.

Replace dialogue with a new take or cleaner recording, lips re-aligned.

Re-align mouths on footage where the original audio drifted.

Add a synced spoken track to a video you generated on ZOOOP.
Pick the right tool. Your credits work everywhere on ZOOOP.
Open Pixverse Lipsync from this page or pick it in the Video tools.
Upload the video clip and the new audio track.
Confirm the inputs.
Generate, then download or send the clip to your canvas.
Pixverse Lipsync is a re-sync tool: give it a video clip and a new audio track, and it re-aligns the subject's mouth to the new audio. The starting point is footage you already have — a talking-head clip, a recorded performance, or a video generated elsewhere on ZOOOP — which makes it the model for dubbing, re-voicing, language swaps, and fixing audio that drifted out of sync. Producing several localized cuts of the same clip is affordable.
The flow is deliberately simple: a video and an audio track, nothing else to manage. The natural pairing is a TTS model — generate the new voice in any supported language, then re-sync your clip to it for a localized version with no re-shoot.
Where it's the wrong tool: if you're starting from a single still image rather than video, you want Kling Avatar V2, which generates a talking performance from one image. Kling Lipsync is another re-sync option in a different line. Pixverse Lipsync's lane is re-syncing existing video footage.
A reasonable mental model: default to Pixverse Lipsync when you have a video clip and want its mouth matched to new audio. To start from a still image instead, use Kling Avatar V2.
It re-syncs the mouth of an existing video clip to a new audio track — for dubbing, re-voicing, language swaps, or fixing sync drift.
Pixverse Lipsync re-syncs an existing video to new audio. Kling Avatar V2 generates a talking video from a single still image plus audio. Pick Lipsync when you already have footage.
Yes — it's a low-cost way to re-voice clips, so producing several localized cuts of the same clip is realistic.
Yes — generate the new voice with a TTS model first, then re-sync your clip to it.
Video*
Audio*