HF-Party (Hugging Face Party @ PyTorch Conference)

posted an update 2 days ago

Post

1015

𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗡𝗲𝘅𝘁 𝗦𝗸𝗶 𝗔𝗱𝘃𝗲𝗻𝘁𝘂𝗿𝗲 𝗝𝘂𝘀𝘁 𝗚𝗼𝘁 𝗦𝗺𝗮𝗿𝘁𝗲𝗿: 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗔𝗹𝗽𝗶𝗻𝗲 𝗔𝗴𝗲𝗻𝘁!🏔️⛷️

With the big hype around AI agents these days, I couldn’t stop thinking about how AI agents could truly enhance real-world activities.
What sort of applications could we build with those AI agents: agentic RAG? self-correcting text-to-sql? Nah, boring…

Passionate about outdoors, I’ve always dreamed of a tool that could simplify planning mountain trips while accounting for all potential risks. That’s why I built 𝗔𝗹𝗽𝗶𝗻𝗲 𝗔𝗴𝗲𝗻𝘁, a smart assistant designed to help you plan safe and enjoyable itineraries in the French Alps and Pyrenees.

Built using Hugging Face's 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 library, Alpine Agent combines the power of AI with trusted resources like 𝘚𝘬𝘪𝘵𝘰𝘶𝘳.𝘧𝘳 (https://skitour.fr/) and METEO FRANCE. Whether it’s suggesting a route with moderate difficulty or analyzing avalanche risks and weather conditions, this agent dynamically integrates data to deliver personalized recommendations.

In my latest blog post, I share how I developed this project—from defining tools and integrating APIs to selecting the best LLMs like 𝘘𝘸𝘦𝘯2.5-𝘊𝘰𝘥𝘦𝘳-32𝘉-𝘐𝘯𝘴𝘵𝘳𝘶𝘤𝘵, 𝘓𝘭𝘢𝘮𝘢-3.3-70𝘉-𝘐𝘯𝘴𝘵𝘳𝘶𝘤𝘵, or 𝘎𝘗𝘛-4.

⛷️ Curious how AI can enhance adventure planning? Try the app and share your thoughts: florentgbelidji/alpine-agent

👉 Want to build your own agents? Whether for cooking, sports training, or other passions, the possibilities are endless. Check out the blog post to learn more: https://huggingface.co/blog/florentgbelidji/alpine-agent

Many thanks to @m-ric for helping on building this tool with smolagents!

m-ric

posted an update 3 days ago

Post

883

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01

m-ric

posted an update 4 days ago

Post

2188

𝗪𝗲'𝘃𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 𝘃𝟭.𝟯.𝟬 🚀, and it comes with a major feature: you can now log agent runs using OpenTelemetry to inspect them afterwards! 📊

This interactive format is IMO much easier to inspect big multi-step runs than endless console logs.

The setup is very easy, in a few lines of code.

Find a tutorial here 👉 https://huggingface.co/docs/smolagents/tutorials/inspect_runs

4 replies

·

m-ric

posted an update 7 days ago

Post

567

𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥

The main bottleneck in building GUI agents it to find training data.
GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do

You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.

➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.

🔍 Exploration-driven vs task-driven approach:
‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting
‣ It then reverse-engineers high-level tasks from successful interaction patterns
‣ This leads to more natural and diverse training data than predefined tasks

🎯 Novel reward model for trajectory quality:
‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion
‣ This preserves valuable partial successes that would otherwise be wasted

🏆 Superior results across environments:
‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%)

By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!

Read the paper here 👉 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (2412.19723)

danielhanchen

posted an update 9 days ago

Post

2666

We fixed many bugs in Phi-4 & uploaded fixed GGUF + 4-bit versions! ✨

Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!

GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit

You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb

Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4

Skylion007

authored a paper 9 days ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published 10 days ago • 77

danielhanchen

posted an update 12 days ago

Post

3062

Deepseek V3, including GGUF + bf16 versions are now uploaded!

Includes 2, 3, 4, 5, 6 and 8-bit quantized versions.

GGUFs: unsloth/DeepSeek-V3-GGUF
bf16: unsloth/DeepSeek-V3-bf16

Min. hardware requirements to run: 48GB RAM + 250GB of disk space for 2-bit.

See how to run them with examples and the full collection: unsloth/deepseek-v3-all-versions-677cf5cfd7df8b7815fc723c

m-ric

posted an update 12 days ago

Post

4991

Since I published it on GitHub a few days ago,
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯

➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. ✨

Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/

3 replies

·

Johannes

authored a paper 12 days ago

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

Paper • 2501.02045 • Published 16 days ago • 21

jeffboudier

posted an update 12 days ago

Post

521

NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co/blog/mingyuliutw/nvidia-cosmos

1 reply

·

clem

posted an update 16 days ago

Post

4012

Cool to see @ylecun joining the top 10 of most followed on HF!

(and leaderboard by @mvaloatto is here: mvaloatto/TCTF)

2 replies

·

1aurent

posted an update 19 days ago

Post

717

Hey everyone 🤗!
Check out this new Virtual Try Off model (based on SD1.5): 1aurent/TryOffAnyone
This model isn't as accurate as others (e.g. xiaozaa/cat-try-off-flux based on FLUX.1) but it sure is fast!

mbrack

authored a paper 27 days ago

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Paper • 2412.15035 • Published about 1 month ago • 4

m-ric

posted an update about 1 month ago

Post

2319

After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

𝗧𝗟;𝗗𝗥:
🏛️ Architecture changes:
⇒ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

🥇 As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post 👉 https://huggingface.co/blog/modernbert

1 reply

·

m-ric

posted an update about 1 month ago

Post

2481

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

·

clem

posted an update about 1 month ago

Post

1898

Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots 🤖🦾🦿

https://t.co/enkFXjWndJ

1 reply

·

m-ric

posted an update about 1 month ago

Post

2211

𝗣𝗼𝘁𝗲𝗻𝘁𝗶𝗮𝗹 𝗽𝗮𝗿𝗮𝗱𝗶𝗴𝗺 𝘀𝗵𝗶𝗳𝘁 𝗶𝗻 𝗟𝗟𝗠𝘀: 𝗻𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗯𝘆 𝗠𝗲𝘁𝗮 𝗰𝗹𝗮𝗶𝗺𝘀 𝘁𝗵𝗮𝘁 𝘄𝗲 𝗰𝗮𝗻 𝗴𝗲𝘁 𝗿𝗶𝗱 𝗼𝗳 𝘁𝗼𝗸𝗲𝗻𝗶𝘇𝗲𝗿𝘀! 🥳

Current LLMs process text by first splitting it into tokens. They use a module named "tokenizer", that -spl-it-s- th-e- te-xt- in-to- arbitrary tokens depending on a fixed dictionnary.
On the Hub you can find this dictionary in a model's files under tokenizer.json.

➡️ This process is called BPE tokenization. It is suboptimal, everyone says it. It breaks text into predefined chunks that often fail to capture the nuance of language. But it has been a necessary evil in language models since their inception.

💥 In Byte Latent Transformer (BLT), Meta researchers propose an elegant solution by eliminating tokenization entirely, working directly with raw bytes while maintaining efficiency through dynamic "patches."

This had been tried before with different byte-level tokenizations, but it's the first time that an architecture of this type scales as well as BPE tokenization. And it could mean a real paradigm shift! 👏👏

🏗️ 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲:
Instead of a lightweight tokenizer, BLT has a lightweight encoder that process raw bytes into patches. Then the patches are processed by the main heavy-duty transformers as we do normally (but for patches of bytes instead of tokens), before converting back to bytes.

🧩 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴:
Instead of fixed tokens, BLT groups bytes based on their predictability (measured by entropy) - using more compute for complex sequences and efficiently handling simple ones. This allows efficient processing while maintaining byte-level understanding.

I hope this breakthrough is confirmed and we can get rid of all the tokenizer stuff, it will make model handling easier!

Read their paper here 👉 https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf

2 replies

·

m-ric

posted an update about 1 month ago

Post

2601

💥 𝗚𝗼𝗼𝗴𝗹𝗲 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝘀 𝗚𝗲𝗺𝗶𝗻𝗶 𝟮.𝟬, 𝘀𝘁𝗮𝗿𝘁𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗮 𝗙𝗹𝗮𝘀𝗵 𝗺𝗼𝗱𝗲𝗹 𝘁𝗵𝗮𝘁 𝘀𝘁𝗲𝗮𝗺𝗿𝗼𝗹𝗹𝘀 𝗚𝗣𝗧-𝟰𝗼 𝗮𝗻𝗱 𝗖𝗹𝗮𝘂𝗱𝗲-𝟯.𝟲 𝗦𝗼𝗻𝗻𝗲𝘁! And they start a huge effort on agentic capabilities.

🚀 The performance improvements are crazy for such a fast model:
‣ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed
‣ Now supports both input AND output of images, video, audio and text
‣ Can natively use tools like Google Search and execute code

➡️ If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks 🤯

🤖 What about the agentic capabilities?

‣ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps
‣ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!)
‣ Jules: An AI coding agent that integrates with GitHub workflows

I'll be eagerly awaiting further news from Google!

Read their blogpost here 👉 https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/

m-ric

posted an update about 1 month ago

Post

1819

𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝐥𝐚𝐰𝐬 𝐚𝐫𝐞 𝐧𝐨𝐭 𝐝𝐞𝐚𝐝 𝐲𝐞𝐭! New blog post suggests Anthropic might have an extremely strong Opus-3.5 already available, but is not releasing it to keep their edge over the competition. 🧐

❓Since the release of Opus-3.5 has been delayed indefinitely, there have been lots of rumors and articles about LLMs plateauing. Scaling laws, the main powering factor of the LLM competence increase, could have stopped, according to these rumors, being the cause of this stalling of progress.

These rumors were quickly denied by many people at the leading LLM labs, including OpenAI and Anthropic. But these people would be expected to hype the future of LLMs even if scaling laws really plateaued, so the jury is still out.

🗞️ This new article by Semianalysis (generally a good source, specifically on hardware) provides a counter-rumor that I find more convincing:

➡️ Maybe scaling laws still work, Opus-3.5 is ready and as good as planned, but they just don't release it because the synthetic data it helps provide can bring cheaper/smaller models Claude and Haiku up in performance, without risking to leak this precious high-quality synthetic data to competitors.

Time will tell! I feel like we'll know more soon.

Read the article: https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-infrastructure-orion-and-claude-3-5-opus-failures/

1 reply

·

julien-c

posted an update about 1 month ago

Post

8447

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team