Seedance 2.0

ByteDance's flagship multi-modal video model — text, image, audio, and video reference inputs in one shot.

No subscription

Credits never expire

Learn more

Pay once for credits — use them across every model on ZOOOP. · Top up when you need to, no monthly burn.

Seedance 2.0

Prompt*

Images

Try Samples

Videos

Audios

Aspect Ratio*

Resolution*

Duration*

Generate Audio

Key features

Native multi-modal audio + video

Single architecture generates synchronized audio and video in one pass — dialogue, ambient sound, beat-aware music, no post-sync step. Supports up to 3 video clips, 9 images, and 3 audio clips as combined reference inputs.

Role-based asset tagging

Tag each reference image as a specific character, prop, or location. Seedance keeps each subject visually consistent across cuts, so the same actor shows up in every shot wearing the same wardrobe.

Reference-guided motion

Provide a video clip as a motion reference and Seedance transfers its choreography onto your character image — useful for dance, sports action, and stylized camera moves.

4-to-15 second clips up to 1080p

Native output up to 1080p in 16:9, 9:16, 4:3, 3:4, 21:9, and 1:1. Cinema-aspect 21:9 and vertical 9:16 ship from the same prompt, no cropping needed.

Use cases

Character-driven shorts

Tag the protagonist once with a reference shot and Seedance keeps face, hair, and wardrobe identical across every cut in the sequence.

Product demos with synced narration

Drop in product stills + a script audio clip and the model generates a video where the lighting, motion, and voiceover beat all land together.

Music video stems

Beat-aware sync means visuals cut on the downbeat. Feed a 15-second audio clip and the model edits camera motion to match.

Storyboard animation

Animate static storyboard panels with motion-reference video for blocking — faster than commissioning a previz pass.

Multi-shot cinematics

Sequence shots in one prompt with role tags. Saves the manual cut-and-stitch that other models force on you.

Stylized music + dance

Provide a dance reference video plus an audio bed; the character image performs choreographed motion on-beat.

Pick the right model

Seedance 2.0 is the strongest all-rounder for native audio + multi-modal references — but every model has a sweet spot. Your credits work across all of them on ZOOOP.

Top-tier reference + audio syncSeedance 2.0 ←

Highest visual fidelity, 1080p+Veo 3.1

Multi-shot storyboardingKling V3

Anime / micro-expressions / cost-effectiveHailuo 2.3

Open-weight model, instruction editsWan 2.7

Photoreal motion, smooth cameraLuma Ray 2

How to use

Open Seedance 2.0 from this page or pick it in the Video Generator.

Drop in your reference images and tag each one (character / prop / scene).

Write the scene prompt — Seedance honors camera moves, lighting cues, and dialogue lines.

Choose duration (4–15s), aspect ratio, and resolution, then hit Generate.

Deep dive

What Seedance 2.0 is good at — and what it's not

Seedance 2.0 is the model you reach for when the scene needs more than a text prompt — when a director would hand the DP a stack of mood boards, a wardrobe sheet, an audio scratch track, and a stunt reference, and expect all of it to land in the same shot. The earlier Seedance 1.5 Pro could take some of those inputs separately. Seedance 2.0 takes them together: up to 9 reference images, 3 video clips, and 3 audio clips fed into one unified multi-modal architecture, and the model decides how to weight them per shot.

The capability that sells the model is role-based asset tagging. Drop in a reference image of your protagonist and label it character_a; drop in a product still and label it product_x; reference a stunt-double video and label it motion_ref. Seedance keeps the tagged character visually consistent — same face, same hair, same wardrobe — across every cut in the generated clip, while the motion reference dictates how they move. This is the single thing other models still struggle with: you generate a 5-second clip and the protagonist's hair color drifts halfway through. Seedance 2.0 locks the role.

The second thing it does well is beat-aware audio sync. Feed a 15-second music bed and the model edits camera moves, character motion, and visual cuts to land on the downbeat. This is native — not a post-process. The same model also generates dialogue with lip-sync and ambient sound (footsteps, room tone, weather) without a separate TTS pass. As of March 2026, Seedance 2.0 sits at Elo 1,269 for text-to-video and Elo 1,351 for image-to-video on public leaderboards — first in both categories ahead of Kling 3.0, Veo 3, and Runway Gen-4.5.

Where it's weaker: honestly, almost nowhere on capability. Seedance 2.0 is the strongest all-rounder of the current flagships — top of the public Elo boards, full 1080p, native audio, the deepest multi-modal reference set, and Kling-V3-style multi-shot when you script it. It's a finish-tier model — reach for it when quality has to win, not for running twenty quick draft variations. Use Grok Imagine when you need to iterate on direction at speed, then graduate the winning prompt to Seedance for the finish.

A reasonable mental model: Seedance 2.0 is the default whenever quality has to win — reference-heavy shots, finished cuts, premium deliverables. For rapid iteration to find the direction, Grok Imagine. For Veo 3.1's dedicated 4K upscale path, switch on the finish. For multi-shot storyboarding with hard cuts in one prompt, Kling V3.

Frequently asked questions

What's new in Seedance 2.0 versus 1.5 Pro?+

A unified multi-modal architecture — Seedance 2.0 takes text, image, audio, and video as a combined input, while 1.5 Pro handled them separately. The biggest practical wins are role-tagged reference images for character consistency, beat-aware audio sync, and native audio that doesn't need a separate TTS pass.

Does Seedance 2.0 generate audio natively?+

Yes. Dialogue, ambient sound, and music score are produced alongside the video in the same generation pass, lip-synced to the visuals. You can also pass an audio reference and the visuals will cut to the beat.

What clip length and resolution does Seedance 2.0 support?+

4 to 15 seconds, up to native 1080p. Aspect ratios include 16:9, 9:16, 4:3, 3:4, 21:9, and 1:1, so you can ship a cinema-aspect master and a vertical social cut from the same prompt without re-rendering.

How does Seedance 2.0 compare to Veo 3.1 and Kling V3?+

Seedance 2.0 leads public Elo rankings for both text-to-video and image-to-video, with Kling 3.0, Veo 3, and Runway Gen-4.5 behind it. It also matches Veo 3.1 at 1080p. Veo's remaining differentiator is its dedicated 4K upscaler; Kling V3 has stronger explicit multi-shot storyboarding. Seedance has no capability weak link — it's the strongest all-rounder of the current flagships.

Can Seedance 2.0 do image-to-video?+

Yes — it tops the public Elo leaderboards for both text-to-video and image-to-video. Seed it with a reference frame and it carries your subject, framing, and style into motion, with role-tagged references keeping characters consistent across shots.