Initial version

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +81 -3
model.onnx +3 -0
voices.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+voices.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,81 @@
----
-license: apache-2.0
----

+---
+tags:
+- audio
+- text-to-speech
+- onnx
+base_model:
+  - hexgrad/Kokoro-82M
+inference: false
+language: en
+license: apache-2.0
+library_name: txtai
+---
+# Kokoro int8 Model for ONNX
+[Kokoro 82M](https://huggingface.co/hexgrad/Kokoro-82M) export to ONNX as int8. This model is from [this GitHub repo](https://github.com/taylorchu/kokoro-onnx/releases/). The voices file is from [this repository](https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files).
+## Usage with txtai
+[txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.
+_Note: This requires txtai >= 8.3.0. Install from GitHub until that release._
+```python
+import soundfile as sf
+from txtai.pipeline import TextToSpeech
+# Build pipeline
+tts = TextToSpeech("NeuML/kokoro-int8-onnx")
+# Generate speech
+speech, rate = tts("Say something here")
+# Write to file
+sf.write("out.wav", speech, rate)
+```
+## Usage with ONNX
+This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer). `ttstokenizer` is a permissively licensed library with no external dependencies (such as espeak).
+Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.
+```python
+import json
+import numpy as np
+import onnxruntime
+import soundfile as sf
+from ttstokenizer import IPATokenizer
+# This example assumes the files have been downloaded locally
+with open("kokoro-int8-onnx/voices.json", "r", encoding="utf-8") as f:
+    voices = json.load(f)
+# Create model
+model = onnxruntime.InferenceSession(
+    "kokoro-int8-onnx/model.onnx",
+    providers=["CPUExecutionProvider"]
+)
+# Create tokenizer
+tokenizer = IPATokenizer()
+# Tokenize inputs
+inputs = tokenizer("Say something here")
+# Get speaker array
+speaker = np.array(self.voices["af"], dtype=np.float32)
+# Generate speech
+outputs = model.run(None, {
+    "tokens": [[0, *inputs, 0]],
+    "style": speaker[len(inputs)],
+    "speed": np.ones(1, dtype=np.float32) * 1.0
+})
+# Write to file
+sf.write("out.wav", outputs[0], 24000)
+```

model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:03e2815f4be9c8289b3b0919f40f5857acd24cfd121ca258cf042d309ee3a0cf
+size 92360686

voices.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc24670e8333cb30990726c5d99e991afc14645139d1a9d2d1858d4fba08df05
+size 54060439