Xenova HF staff commited on
Commit
8c499d8
·
verified ·
1 Parent(s): ec27144

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -2
README.md CHANGED
@@ -3,8 +3,66 @@ license: apache-2.0
3
  library_name: transformers.js
4
  ---
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ## Usage
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ### Python
9
 
10
  ```python
@@ -41,7 +99,12 @@ import scipy.io.wavfile as wavfile
41
  wavfile.write('audio.wav', 24000, audio[0])
42
  ```
43
 
44
- ## Samples
 
 
 
 
 
45
 
46
  | Model | Size (MB) | Sample |
47
  |------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
@@ -52,4 +115,4 @@ wavfile.write('audio.wav', 24000, audio[0])
52
  | model_uint8.onnx (8-bit & mixed precision) | 177 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio> |
53
  | model_uint8f16.onnx (Mixed precision) | 114 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio> |
54
  | model_q4.onnx (4-bit matmul) | 305 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio> |
55
- | model_q4f16.onnx (4-bit matmul & fp16 weights) | 154 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio> |
 
3
  library_name: transformers.js
4
  ---
5
 
6
+ # Kokoro TTS
7
+
8
+ Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out).
9
+
10
+ ## Table of contents
11
+
12
+ - [Samples](#samples)
13
+ - [Usage](#usage)
14
+ - [JavaScript](#javascript)
15
+ - [Python](#python)
16
+
17
+ ## Samples
18
+
19
+
20
+ > Life is like a box of chocolates. You never know what you're gonna get.
21
+
22
+
23
+ | Voice | Nationality | Gender | Sample |
24
+ |--------------------------|-------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------|
25
+ | Default (`af`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/C0_ZUcNSAxvMwpS8QbnKv.wav"></audio> |
26
+ | Bella (`af_bella`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/B_q15Z_FXdgBP9-Hk9oKq.wav"></audio> |
27
+ | Nicole (`af_nicole`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/sS8U5lQHkhgX7rwTmy-5w.wav"></audio> |
28
+ | Sarah (`af_sarah`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/SokkBiqEqwxLLx_pqvf1p.wav"></audio> |
29
+ | Sky (`af_sky`) | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/IzySGHUtl5mYeFxx1oaRf.wav"></audio> |
30
+ | Adam (`am_adam`) | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9n6myE6--ZsEuF5xDv5eC.wav"></audio> |
31
+ | Michael (`am_michael`) | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/EPFciGtTU1YUXu8MAw7DX.wav"></audio> |
32
+ | Emma (`bf_emma`) | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/AGEsXs-gyJq3dsyo7PjHo.wav"></audio> |
33
+ | Isabella (`bf_isabella`) | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/JEzrrXYJSDcmlEzI7tE0c.wav"></audio> |
34
+ | George (`bm_george`) | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/nsv4zKB4MX2TvXRxv504k.wav"></audio> |
35
+ | Lewis (`bm_lewis`) | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/g_mcBl2xTbQl0sbrpZt48.wav"></audio> |
36
+
37
+
38
  ## Usage
39
 
40
+ ### JavaScript
41
+
42
+ First, install the `kokoro-tts` library from [NPM](https://npmjs.com/package/kokoro-tts) using:
43
+ ```bash
44
+ npm i kokoro-tts
45
+ ```
46
+
47
+ You can then generate speech as follows:
48
+
49
+ ```js
50
+ import { KokoroTTS } from "kokoro-tts";
51
+
52
+ const model_id = "onnx-community/Kokoro-82M-ONNX";
53
+ const tts = await KokoroTTS.from_pretrained(model_id, {
54
+ dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
55
+ });
56
+
57
+ const text = "Life is like a box of chocolates. You never know what you're gonna get.";
58
+ const audio = await tts.generate(text, {
59
+ // Use `tts.list_voices()` to list all available voices
60
+ voice: "af_bella",
61
+ });
62
+ audio.save("audio.wav");
63
+ ```
64
+
65
+
66
  ### Python
67
 
68
  ```python
 
99
  wavfile.write('audio.wav', 24000, audio[0])
100
  ```
101
 
102
+ ## Quantizations
103
+
104
+ The model is resilient to quantization, enabling efficient high-quality speech synthesis at a fraction of the original model size.
105
+
106
+ > How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born.
107
+
108
 
109
  | Model | Size (MB) | Sample |
110
  |------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
 
115
  | model_uint8.onnx (8-bit & mixed precision) | 177 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio> |
116
  | model_uint8f16.onnx (Mixed precision) | 114 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio> |
117
  | model_q4.onnx (4-bit matmul) | 305 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio> |
118
+ | model_q4f16.onnx (4-bit matmul & fp16 weights) | 154 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio> |