nltpt

Enterprise
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

nltpt's activity

Xenova 
posted an update 3 days ago
Xenova 
posted an update 16 days ago
view post
Post
2694
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
🚀 Faster and more accurate than Whisper
🔒 Privacy-focused (no data leaves your device)
⚡️ WebGPU accelerated (w/ WASM fallback)
🔥 Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
·
Narsil 
posted an update 23 days ago
view post
Post
1025
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !



3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
Xenova 
posted an update 26 days ago
view post
Post
2961
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! 🔥 High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. 🤗 Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)

Create USE_POLICY.md

#8 opened 29 days ago by
reach-vb

Create LICENSE

#7 opened 29 days ago by
reach-vb
ariG23498 
posted an update 29 days ago

Add transformers use instructions!

#4 opened about 1 month ago by
reach-vb

Update README.md

#3 opened about 1 month ago by
reach-vb