ElevenLabs Multilingual V3

ElevenLabs' top-tier TTS — 74 languages, multi-speaker dialogue, emotion tags, audiobook-grade narration.

Intet abonnement

Kreditter udløber aldrig

Betal en gang for kreditter - brug dem på tværs af hver model på ZOOOP. · Fyld op, når du har brug for det, ingen månedlig forbrænding.

ElevenLabs Multilingual V3

Text*

CJK = 2 karakterer · 0/10000

Voice*

Stability*

0.50

Nøglefunktioner

74 languages, one model

V3 supports 74 languages — up from ~29 in V2 — covering the vast majority of the world's population. The same voice characteristic carries across languages.

Multi-speaker dialogue

New Text-to-Dialogue API generates natural lifelike dialogue with multiple distinct speakers in a single render — character interactions across languages, with emotional consistency.

Audio tags for direction

Inline tags like [whispering], [sad], [laughs], [shouting] direct the read across languages — a [sad] tag in Spanish lands the same way it does in English.

Hundreds of multilingual voices

Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill — and many more. Each works across all 74 languages.

Brugssager

Audiobook production

Long-form narration with audiobook-grade emotional delivery, including subtle tonal shifts across chapters and characters.

Character dialogue

Multi-speaker Text-to-Dialogue handles full scenes with distinct characters who interact emotionally — useful for animation, games, and audio drama.

Multilingual campaigns

Generate the same script in 74 languages with consistent voice characteristics. One brand voice, every market, no separate cast per language.

E-learning narration

Calm explanatory tone with emphasis on key terms — tags let you direct pacing and stress without re-recording.

Podcast intros and ads

Audiobook-grade fidelity at podcast-ad lengths — drop into existing podcast pipelines without quality drop.

Game character voice

Use audio tags to deliver context-specific reads ([angry], [whispering], [tired]) for in-game lines without a voice cast.

Vælg den rigtige model

Pick the right TTS model for the work. Your credits work everywhere on ZOOOP.

Top quality, 74 languages, multi-speakerElevenLabs V3 ←

Full song with vocals + structureLyria 3 Pro

Hvordan man bruger

Open ElevenLabs Multilingual V3 from this page or pick it in the Audio Generator.

Pick a voice from the library — each works across all 74 languages.

Write the script in your target language. Add inline tags like [whispering] or [sad] to direct emotion.

Generate. For multi-speaker, switch to Text-to-Dialogue and assign lines per voice.

Dybt dyk

What ElevenLabs Multilingual V3 is good at — and what it's not

ElevenLabs Multilingual V3 is the model that made multilingual TTS production-ready. For most of TTS history, "multilingual" was a checkbox feature — five languages, ten if you were lucky, with the non-English options noticeably stilted. V3 ships with 74 languages — covering the vast majority of the world's population — and the non-English reads hold the same emotional fidelity, pacing, and naturalism as the English ones. Practical effect: a single brand voice now ships across global markets without a separate cast per language and without the off-brand local read that always crept in.

The capability that gets less attention but matters more for production work is audio tags as performance direction. Inline marks like [whispering], [sad], [laughs], [shouting], [angry], [tired] placed directly in the text are read by V3 as directorial instructions and applied across whichever language you're generating in. A [sad] tag in Spanish lands the same way it does in English; a [whispering] instruction in Japanese reads as a hush rather than a quiet baseline. For audiobook narration, character dialogue, and audio drama, this collapses the back-and-forth between "write the line" and "describe how it should sound" — the direction lives in the text itself.

The third flagship capability is the Text-to-Dialogue API. Multi-speaker conversations with distinct characters — each with their own voice — generated as a continuous interaction with emotional consistency. Useful for animation dubs, game cutscenes, audio drama, and any content where the deliverable is character interaction rather than monologue. Pair this with V3's emotion tags and you have a tool that produces what used to require an entire voice cast plus a director.

Voice library is hundreds of multilingual voices — Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill, and many more. Each voice carries its characteristic across all 74 languages, so a deep narrator voice in English stays deep in Mandarin, French, and Korean. For audiobook publishers, e-learning producers, and podcast networks, this is the difference between "AI voice" and "production voice."

Where it's weaker: ultra-low-latency real-time use (live conversational agents under 200ms first-response) is better served by lighter, faster models like Speech-2.8-Turbo from MiniMax. Voice cloning from short samples is supported but specialized models like Chatterbox TTS Multilingual or Index TTS 2 are tuned specifically for that. V3's sweet spot is high-quality narration, multi-speaker dialogue, and multilingual brand work.

A reasonable mental model: V3 is the default for any narration / dialogue work where quality matters more than millisecond latency.

Ofte stillede spørgsmål

How is V3 different from V2 / Multilingual V2?+

V3 supports 74 languages (up from ~29 in V2), introduces emotion / direction audio tags, ships the Text-to-Dialogue API for multi-speaker scenes, and produces noticeably more natural emotional range. V2 remains a strong baseline; V3 is the upgrade for any new project.

Does V3 work in my language?+

V3 covers 74 languages including English, Chinese (Simplified + Traditional), Japanese, Korean, Spanish, French, German, Portuguese, Hindi, Arabic, Russian, Vietnamese, Thai, Indonesian, Turkish, Polish, Dutch, Norwegian, Danish, and many more — most of the world's commonly used languages.

What are audio tags?+

Inline directorial marks like `[whispering]`, `[laughs]`, `[sad]`, `[angry]`, `[shouting]` placed in the text. V3 reads them as performance direction and applies the emotion across whichever language you're generating in. A [sad] tag in Spanish lands the same way it does in English.

Can V3 do multi-speaker dialogue?+

Yes — the Text-to-Dialogue API generates natural multi-speaker conversations with emotional consistency across speakers and languages. Useful for audio drama, animation dubs, games, and any content with character interactions.

How does V3 compare to other TTS models?+

V3 leads on language coverage (74 languages, more than any competitor) and on direction (audio tags work cross-lingually). For ultra-low latency real-time use, lighter models like Speech-2.8-Turbo are faster. For full audiobook / drama production, V3 is the current quality leader.