
Emotional character voice
Clone a character voice and dial in the exact emotional blend for each line.
Bilibili's Index TTS 2 — voice cloning with fine-grained control over eight emotions.
Betaal één keer voor credits - gebruik ze voor elk model op ZOOOP. · Vul bij wanneer dat nodig is, geen maandelijkse verbranding.
Powered by Bilibili Index's API on ZOOOP
Provide a reference audio sample and Index TTS 2 speaks your text in that cloned voice.
Dial in happy, angry, sad, afraid, disgusted, melancholic, surprised, and calm individually — blend emotions to shape exactly how a line reads.
Set each emotion's strength independently for nuanced, layered expression.
Built on the Bilibili Index voice model.

Clone a character voice and dial in the exact emotional blend for each line.

Set anger, fear, or melancholy strengths to match a dramatic scene.

Reproduce a consistent voice with controllable emotional range.

Generate the cloned, emotion-controlled voice, then drive an avatar model with it.
Pick the right voice model. Your credits work everywhere on ZOOOP.
Open Index TTS 2 from this page or pick it in the Audio tools.
Upload a reference voice sample and paste your text.
Set the strength of each emotion to shape the delivery.
Generate, then download or send the audio to your canvas.
Index TTS 2 is Bilibili's voice-cloning model with a distinctive strength: fine-grained control over eight emotions. Provide a reference sample to clone a voice, then set the strength of happy, angry, sad, afraid, disgusted, melancholic, surprised, and calm — individually — to shape exactly how each line reads. Because the emotions blend, you can layer subtle combinations rather than choosing one preset feeling, which suits performed narration and dramatic dialogue.
The cloning side reproduces a specific voice from your sample, so the same character or brand voice can carry a script with a controllable emotional range. Pricing is per 1,000 characters.
Where it sits among ZOOOP's voice models: Chatterbox TTS is the voice clone built for broad multilingual coverage; LUX TTS is the cheapest clone; for preset voices use Multilingual V3. Index TTS 2's sweet spot is emotionally nuanced voice cloning.
A reasonable mental model: default to Index TTS 2 when a cloned voice needs precise emotional control, and switch to Chatterbox for many languages or LUX TTS for the lowest cost.
Eight, set individually: happy, angry, sad, afraid, disgusted, melancholic, surprised, and calm. Blend them to shape exactly how a line reads.
A reference audio sample of the voice. It speaks your text in that cloned voice with your chosen emotional blend.
Both clone voices. Index TTS 2 offers fine-grained, eight-emotion control; Chatterbox emphasizes broad multilingual coverage. Pick Index TTS 2 when emotional nuance matters most.
Per 1,000 characters of text.
Audio Reference*
Prompt*
Emotion · Happy*
Emotion · Angry*
Emotion · Sad*
Emotion · Afraid*
Emotion · Disgusted*
Emotion · Melancholic*
Emotion · Surprised*
Emotion · Calm*