Update README.md
Browse files
README.md
CHANGED
@@ -3,8 +3,66 @@ license: apache-2.0
|
|
3 |
library_name: transformers.js
|
4 |
---
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
## Usage
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
### Python
|
9 |
|
10 |
```python
|
@@ -41,7 +99,12 @@ import scipy.io.wavfile as wavfile
|
|
41 |
wavfile.write('audio.wav', 24000, audio[0])
|
42 |
```
|
43 |
|
44 |
-
##
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
| Model | Size (MB) | Sample |
|
47 |
|------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
|
@@ -52,4 +115,4 @@ wavfile.write('audio.wav', 24000, audio[0])
|
|
52 |
| model_uint8.onnx (8-bit & mixed precision) | 177 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio> |
|
53 |
| model_uint8f16.onnx (Mixed precision) | 114 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio> |
|
54 |
| model_q4.onnx (4-bit matmul) | 305 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio> |
|
55 |
-
| model_q4f16.onnx (4-bit matmul & fp16 weights) | 154 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio> |
|
|
|
3 |
library_name: transformers.js
|
4 |
---
|
5 |
|
6 |
+
# Kokoro TTS
|
7 |
+
|
8 |
+
Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out).
|
9 |
+
|
10 |
+
## Table of contents
|
11 |
+
|
12 |
+
- [Samples](#samples)
|
13 |
+
- [Usage](#usage)
|
14 |
+
- [JavaScript](#javascript)
|
15 |
+
- [Python](#python)
|
16 |
+
|
17 |
+
## Samples
|
18 |
+
|
19 |
+
|
20 |
+
> Life is like a box of chocolates. You never know what you're gonna get.
|
21 |
+
|
22 |
+
|
23 |
+
| Voice | Nationality | Gender | Sample |
|
24 |
+
|--------------------------|-------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------|
|
25 |
+
| Default (`af`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/C0_ZUcNSAxvMwpS8QbnKv.wav"></audio> |
|
26 |
+
| Bella (`af_bella`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/B_q15Z_FXdgBP9-Hk9oKq.wav"></audio> |
|
27 |
+
| Nicole (`af_nicole`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/sS8U5lQHkhgX7rwTmy-5w.wav"></audio> |
|
28 |
+
| Sarah (`af_sarah`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/SokkBiqEqwxLLx_pqvf1p.wav"></audio> |
|
29 |
+
| Sky (`af_sky`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/IzySGHUtl5mYeFxx1oaRf.wav"></audio> |
|
30 |
+
| Adam (`am_adam`) | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9n6myE6--ZsEuF5xDv5eC.wav"></audio> |
|
31 |
+
| Michael (`am_michael`) | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/EPFciGtTU1YUXu8MAw7DX.wav"></audio> |
|
32 |
+
| Emma (`bf_emma`) | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/AGEsXs-gyJq3dsyo7PjHo.wav"></audio> |
|
33 |
+
| Isabella (`bf_isabella`) | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/JEzrrXYJSDcmlEzI7tE0c.wav"></audio> |
|
34 |
+
| George (`bm_george`) | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/nsv4zKB4MX2TvXRxv504k.wav"></audio> |
|
35 |
+
| Lewis (`bm_lewis`) | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/g_mcBl2xTbQl0sbrpZt48.wav"></audio> |
|
36 |
+
|
37 |
+
|
38 |
## Usage
|
39 |
|
40 |
+
### JavaScript
|
41 |
+
|
42 |
+
First, install the `kokoro-tts` library from [NPM](https://npmjs.com/package/kokoro-tts) using:
|
43 |
+
```bash
|
44 |
+
npm i kokoro-tts
|
45 |
+
```
|
46 |
+
|
47 |
+
You can then generate speech as follows:
|
48 |
+
|
49 |
+
```js
|
50 |
+
import { KokoroTTS } from "kokoro-tts";
|
51 |
+
|
52 |
+
const model_id = "onnx-community/Kokoro-82M-ONNX";
|
53 |
+
const tts = await KokoroTTS.from_pretrained(model_id, {
|
54 |
+
dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
|
55 |
+
});
|
56 |
+
|
57 |
+
const text = "Life is like a box of chocolates. You never know what you're gonna get.";
|
58 |
+
const audio = await tts.generate(text, {
|
59 |
+
// Use `tts.list_voices()` to list all available voices
|
60 |
+
voice: "af_bella",
|
61 |
+
});
|
62 |
+
audio.save("audio.wav");
|
63 |
+
```
|
64 |
+
|
65 |
+
|
66 |
### Python
|
67 |
|
68 |
```python
|
|
|
99 |
wavfile.write('audio.wav', 24000, audio[0])
|
100 |
```
|
101 |
|
102 |
+
## Quantizations
|
103 |
+
|
104 |
+
The model is resilient to quantization, enabling efficient high-quality speech synthesis at a fraction of the original model size.
|
105 |
+
|
106 |
+
> How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born.
|
107 |
+
|
108 |
|
109 |
| Model | Size (MB) | Sample |
|
110 |
|------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
|
|
|
115 |
| model_uint8.onnx (8-bit & mixed precision) | 177 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio> |
|
116 |
| model_uint8f16.onnx (Mixed precision) | 114 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio> |
|
117 |
| model_q4.onnx (4-bit matmul) | 305 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio> |
|
118 |
+
| model_q4f16.onnx (4-bit matmul & fp16 weights) | 154 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio> |
|