Core ML Projects

community
Activity Feed

AI & ML interests

Take the Hub to iOS and macOS

Recent Activity

coreml-projects's activity

XenovaĀ 
posted an update 3 days ago
view post
Post
2056
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by šŸ¤— Transformers.js. WebGPU support coming soon!
šŸ‘‰ npm i kokoro-js šŸ‘ˆ

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! šŸ¤—

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! šŸ¤Æ
  • 2 replies
Ā·
pagezyhfĀ 
posted an update 6 days ago
XenovaĀ 
posted an update 18 days ago
view post
Post
6337
First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! šŸ¤Æ

Try it out yourself! šŸ‘‡
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
XenovaĀ 
posted an update about 1 month ago
view post
Post
3925
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
šŸš€ Faster and more accurate than Whisper
šŸ”’ Privacy-focused (no data leaves your device)
āš”ļø WebGPU accelerated (w/ WASM fallback)
šŸ”„ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
Ā·
XenovaĀ 
posted an update about 1 month ago
view post
Post
3065
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! šŸ”„ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. šŸ¤— Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
reach-vbĀ 
posted an update about 1 month ago
view post
Post
4263
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! šŸ”„
pagezyhfĀ 
posted an update about 2 months ago
pagezyhfĀ 
posted an update about 2 months ago
view post
Post
972
Itā€™s 2nd of December , hereā€™s your Cyber Monday present šŸŽ !

Weā€™re cutting our price down on Hugging Face Inference Endpoints and Spaces!

Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3ļøāƒ£ months. We have other reductions on all instances ranging from 20 to 50%.

Sounds like the time to give Inference Endpoints a try? Get started today and find in our documentation the full pricing details.
https://ui.endpoints.huggingface.co/
https://huggingface.co/pricing
victorĀ 
posted an update about 2 months ago
view post
Post
2019
Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing šŸ¤Æ.

Check this dataset for the results: victor/qwq-misguided-attention
  • 2 replies
Ā·
XenovaĀ 
posted an update about 2 months ago
view post
Post
3996
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! šŸ¤Æ Let's take a look:
šŸ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
šŸ‘ļø Qwen2-VL from Qwen for dynamic-resolution image understanding
šŸ”¢ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
šŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
šŸ¤øā€ā™€ļø ViTPose for pose estimation
šŸ“„ MGP-STR for optical character recognition (OCR)
šŸ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! šŸ”„ Huge for privacy!

Check out the release notes for more information. šŸ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
pagezyhfĀ 
posted an update about 2 months ago
view post
Post
303
Hello Hugging Face Community,

if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !

To watch in this order:
1. Learn what are Hugging Face Deep Learning Containers
https://youtu.be/aWMp_hUUa0c?si=t-LPRkRNfD3DDNfr

2. Learn how to deploy a LLM with our Deep Learning Container using Text Generation Inference
https://youtu.be/Q3oyTOU1TMc?si=V6Dv-U1jt1SR97fj

3. Learn how to scale your inference endpoint based on traffic
https://youtu.be/QjLZ5eteDds?si=nDIAirh1r6h2dQMD

If you want more of these small tutorials and have any theme in mind, let me know!
victorĀ 
posted an update about 2 months ago
view post
Post
2485
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer šŸ”„
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights šŸš€.
reach-vbĀ 
posted an update about 2 months ago
view post
Post
4129
Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI TĆ¼lu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar ( OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! šŸ¤—
victorĀ 
posted an update about 2 months ago
view post
Post
1832
Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!
XenovaĀ 
posted an update 2 months ago
view post
Post
5636
Have you tried out šŸ¤— Transformers.js v3? Here are the new features:
āš” WebGPU support (up to 100x faster than WASM)
šŸ”¢ New quantization formats (dtypes)
šŸ› 120 supported architectures in total
šŸ“‚ 25 new example projects and templates
šŸ¤– Over 1200 pre-converted models
šŸŒ Node.js (ESM + CJS), Deno, and Bun compatibility
šŸ” A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 3 replies
Ā·
reach-vbĀ 
posted an update 2 months ago
view post
Post
4387
What a brilliant week for Open Source AI!

Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f

LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
microsoft/llm2clip-672323a266173cfa40b32d4c

Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
Nexusflow/athene-v2-6735b85e505981a794fb02cc

Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
microsoft/orca-agentinstruct-1M-v1

Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71

JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
deepseek-ai/JanusFlow-1.3B

Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
PleIAs/common_corpus

I'm sure I missed a lot, can't wait for the next week!

Put down in comments what I missed! šŸ¤—
pagezyhfĀ 
posted an update 2 months ago
view post
Post
1362
Hello Hugging Face Community,

I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!

With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:

āš” Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration.
šŸ› ļø Hassle-free environment setup, no more dependency issues.
šŸ”„ Seamless updates to the latest stable versions.
šŸ’¼ Streamlined workflow, reducing dev and maintenance overheads.
šŸ”’ Robust security features of Google Cloud.
ā˜ļø Fine-tuned for optimal performance, integrated with GKE and Vertex AI.
šŸ“¦ Community examples for easy experimentation and implementation.
šŸ”œ TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!

Find the documentation at https://huggingface.co/docs/google-cloud/en/index
If you need support, open a conversation on the forum: /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Fc%2Fgoogle-cloud%2F69%3C%2Fa%3E
reach-vbĀ 
posted an update 3 months ago
view post
Post
1642
Smol TTS models are here! OuteTTS-0.1-350M - Zero shot voice cloning, built on LLaMa architecture, CC-BY license! šŸ”„

> Pure language modeling approach to TTS
> Zero-shot voice cloning
> LLaMa architecture w/ Audio tokens (WavTokenizer)
> BONUS: Works on-device w/ llama.cpp āš”

Three-step approach to TTS:

> Audio tokenization using WavTokenizer (75 tok per second)
> CTC forced alignment for word-to-audio token mapping
> Structured prompt creation w/ transcription, duration, audio tokens

The model is extremely impressive for 350M parameters! Kudos to the
OuteAI team on such a brilliant feat - I'd love to see this be applied on larger data and smarter backbones like SmolLM šŸ¤—

Check out the models here: OuteAI/outetts-6728aa71a53a076e4ba4817c