Kokoro fp16 Model for ONNX

Kokoro 82M export to ONNX as fp16. This model is from this GitHub repo. The voices file is from this repository.

Usage with txtai

txtai has a built in Text to Speech (TTS) pipeline that makes using this model easy.

Note: This requires txtai >= 8.3.0. Install from GitHub until that release.

import soundfile as sf

from txtai.pipeline import TextToSpeech

# Build pipeline
tts = TextToSpeech("NeuML/kokoro-fp16-onnx")

# Generate speech
speech, rate = tts("Say something here")

# Write to file
sf.write("out.wav", speech, rate)

Usage with ONNX

This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with ttstokenizer. ttstokenizer is a permissively licensed library with no external dependencies (such as espeak).

Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.

import json
import numpy as np
import onnxruntime
import soundfile as sf

from ttstokenizer import IPATokenizer

# This example assumes the files have been downloaded locally
with open("kokoro-fp16-onnx/voices.json", "r", encoding="utf-8") as f:
    voices = json.load(f)

# Create model
model = onnxruntime.InferenceSession(
    "kokoro-fp16-onnx/model.onnx",
    providers=["CPUExecutionProvider"]
)

# Create tokenizer
tokenizer = IPATokenizer()

# Tokenize inputs
inputs = tokenizer("Say something here")

# Get speaker array
speaker = np.array(self.voices["af"], dtype=np.float32)

# Generate speech
outputs = model.run(None, {
    "tokens": [[0, *inputs, 0]],
    "style": speaker[len(inputs)],
    "speed": np.ones(1, dtype=np.float32) * 1.0
})

# Write to file
sf.write("out.wav", outputs[0], 24000)

Speaker reference

The Kokoro model has a number of built-in speakers.

When using this model, set a speaker id from the reference table below.

SPEAKER	GENDER	NATIONALITY	EXAMPLE
af	F	American	Link
af_bella	F	American	Link
af_nicole	F	American	Link
af_sarah	F	American	Link
af_sky	F	American	Link
am_adam	M	American	Link
af_michael	M	American	Link
bf_emma	F	British	Link
bf_isabella	F	British	Link
bm_george	M	British	Link
bm_lewis	M	British	Link

The following shows an example on how to set a speaker id when using txtai

speech, rate = tts("Say something here", speaker="af_sky")

NeuML
/

kokoro-fp16-onnx

Kokoro fp16 Model for ONNX

Usage with txtai

Usage with ONNX

Speaker reference

Model tree for NeuML/kokoro-fp16-onnx

Collection including NeuML/kokoro-fp16-onnx

Text to Speech (TTS)