54 278 284

Aymeric Roucher

m-ric

http://aymeric-roucher.github.io

AI & ML interests

Leading Agents at Hugging Face 🤗

Recent Activity

updated a Space 1 day ago

m-ric/get-travel-duration-tool

liked a Space 3 days ago

reach-vb/2024-ai-timeline

updated a dataset 4 days ago

huggingface/documentation-images

View all activity

Articles

Organizations

m-ric's activity

updated a Space 1 day ago

Running

📈

huggingface/documentation-images

Viewer • Updated 3 days ago • 50 • 2.2M • 45

New activity in hf-doc-build/doc-build 4 days ago

Upload _versions.yml

#32 opened 5 days ago by

m-ric

updated a Space 4 days ago

Running

🏢

A New Approach for Explainable Multiple Organ Annotation with Few Data

Paper • 1912.12932 • Published Dec 30, 2019 • 1

liked a dataset 5 days ago

hf-doc-build/doc-build

Updated about 11 hours ago • 239k • 7

updated a dataset 5 days ago

hf-doc-build/doc-build

Updated about 11 hours ago • 239k • 7

liked a Space 7 days ago

Running

418

🦀

huggingface/documentation-images

Viewer • Updated 3 days ago • 50 • 2.2M • 45

updated a dataset 8 days ago

m-ric/agents_medium_benchmark_2

Viewer • Updated 8 days ago • 142 • 37 • 3

liked a dataset 8 days ago

basicv8vc/SimpleQA

Viewer • Updated Nov 5, 2024 • 4.33k • 145 • 5

liked a model 9 days ago

BAAI/bge-small-en-v1.5

Feature Extraction • Updated Feb 22, 2024 • 4.96M • 274

updated a dataset 11 days ago

m-ric/agents_medium_benchmark

Viewer • Updated 11 days ago • 172 • 94 • 3

liked a Space 15 days ago

Running

158

🏃

After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

𝗧𝗟;𝗗𝗥:
🏛️ Architecture changes:
⇒ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

🥇 As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post 👉 https://huggingface.co/blog/modernbert

1 reply

updated a Space 16 days ago

Running

🗺️🏕️

AI Travel Planner

Plan your next vacation with the help of an AI!

posted an update 16 days ago

Post

2282

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

updated a Space 19 days ago

Build error

📚

Test

upvoted an article 19 days ago

Article

🇪🇺✍️ EU AI Act: Systemic Risks in the First CoP Draft Comments ✍️🇪🇺

•

22 days ago

• 12

Aymeric Roucher

AI & ML interests

Recent Activity

Articles

Introducing smolagents: simple agents that write actions in code.

Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge

Our Transformers Code Agent beats the GAIA benchmark!

Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖

License to Call: Introducing Transformers Agents 2.0

Open-source LLMs as LangChain Agents

Organizations

m-ric's activity

Get Travel Duration Tool

2024 AI Timeline

huggingface/documentation-images

Upload _versions.yml

Hf Model Downloads

A New Approach for Explainable Multiple Organ Annotation with Few Data

hf-doc-build/doc-build

hf-doc-build/doc-build

Gemini Coder

huggingface/documentation-images

m-ric/agents_medium_benchmark_2

basicv8vc/SimpleQA

BAAI/bge-small-en-v1.5

m-ric/agents_medium_benchmark

Jupyter Agent

AI Travel Planner

Test

🇪🇺✍️ EU AI Act: Systemic Risks in the First CoP Draft Comments ✍️🇪🇺