Stefano Fiorucci PRO
AI & ML interests
Recent Activity
Articles
Organizations
anakin87's activity
Details:
๐ค Based on ModernBERT-base with 149M parameters.
๐ Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
๐๏ธ Immediate FA2 and unpacking support for super efficient inference.
๐ช Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
โก๏ธ Maximum sequence length of 8192 tokens!
2๏ธโฃ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
โ Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
๐๏ธ Apache 2.0 licensed: fully commercially permissible
Try it out here: nomic-ai/modernbert-embed-base
Very nice work by Zach Nussbaum and colleagues at Nomic AI.
HuggingFaceTB/finemath
Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.
We build the dataset by:
๐ ๏ธ carefully extracting math data from Common Crawl;
๐ iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.
We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.
We hope this helps advance the performance of LLMs on math and reasoning! ๐
Weโre also releasing all the ablation models as well as the evaluation code.
HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
๐ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
๐ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
๐งญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!
I've created this fun and interactive project to help you recognize dog breeds, find the perfect pup for your lifestyle, and even compare different breeds! Recently upgraded with smarter AI detection - it can now better distinguish between dogs and non-dogs (no more confusing cats for huskies! ๐บโก๏ธ๐).
๐พ What's cool about it?
Smart breed recognition powered by AI
Lifestyle-based breed recommendations
Detailed breed comparisons
And now with enhanced non-dog filtering!
๐ Why try it?
Whether you're a dog lover, considering a new furry friend, or just curious, PawMatchAI makes discovering breeds fun and informative! As someone passionate about both AI and pets, I'm combining my two loves while working toward my goal of contributing to the AI industry.
๐ Got feedback?
While it's not perfect, your input helps make it better! I'd love to hear your thoughts as I continue improving this project on my journey into AI development.
๐ Try it now: DawnC/PawMatchAI
๐ฏ Your support matters!
Every like ๐ or comment ๐ helps fuel my passion for AI development and keeps me motivated to create more helpful tools. Let's make the AI journey fun and impactful together!
#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision
3x more tokens.
By reducing our memory footprint, weโre able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster
On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani รซl de Kok for the beast data structure.
Zero config
Thatโs it. Remove all the flags your are using and youโre likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโt have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.
Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
@Mollel created another dataset using Glot for language detection instead of fastText.
https://huggingface.co/datasets/sartifyllc/tulu-3-sft-mixture-language-glot
Good work!
Unfortunately, it was missing the "language" column.
I added it using the good old fastText.
Check out the dataset here ๐ anakin87/tulu-3-sft-mixture-with-language
Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior Tรฉcnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.
๐ท๏ธ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!
Thanks to this annotation process, the open dataset contains two subsets:
1. ๐ฝ Culturally Agnostic: no specific regional, cultural knowledge is required.
2. โ๏ธ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.
Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.
I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.
Dataset: CohereForAI/Global-MMLU
๐๐;๐๐: I reimplemented the Swarm concept using Haystack, but made it work with both open and proprietary models ๐ซ
โ๏ธ blog article: https://haystack.deepset.ai/blog/swarm-of-agents
๐ notebook: https://haystack.deepset.ai/cookbook/swarm
Some time ago OpenAI published Swarm: an educational framework for building multi-agent systems.
Their approach focuses on two main concepts:
ใป ๐๐จ๐ฎ๐ญ๐ข๐ง๐๐ฌ: Each agent follows specific ๐ instructions and uses ๐ ๏ธ tools to execute them.
ใป ๐๐๐ง๐๐จ๐๐๐ฌ ๐ค: Agents can transfer control to one another using tool/function calling.
When I first read these ideas, I thought: ๐ด๐ช๐ฎ๐ฑ๐ญ๐ฆ ๐ฃ๐ถ๐ต ๐ฑ๐ฐ๐ธ๐ฆ๐ณ๐ง๐ถ๐ญ! And they pair well with the recent unified tool support in Haystack.
๐งโ๐ป So, I decided to re-implement these concepts using Haystack, and in just a few lines of code, I had a working prototype.
๐ Bonus feature: this implementation isn't tied to a single model provider - different agents can be powered by different models!
I replicated the ACME customer service example from the original article, with 3 Agents:
๐ Triage Agent - Llama 3.2 running on Ollama
๐ Sales Agent - Anthropic Claude 3.5 Sonnet
๐ Issues and Repairs Agent - OpenAI GPT-4o mini
Want to see the full implementation and give it a try? Check out the blog post and notebook! โจ
huggingface.co/DIBT
is dead! Long live https://huggingface.co/data-is-better-together!
We're working on some very cool projects so we're doing a bit of tidying of the Data is Better Together Hub org ๐ค
๐ก ๐๐๐ ๐ฉ๐ข๐ ๐ฐ๐ข๐ญ๐ก ๐ฌ๐ฒ๐ฌ๐ญ๐๐ฆ ๐ฆ๐๐ฌ๐ฌ๐๐ ๐
I had another idea: use the system message to steer generation towards a specific language.
The system message should be in the target language, like:
"You are an artificial intelligence that answers users' questions in TARGET_LANGUAGE in a useful and detailed way. The user asks complex questions in TARGET_LANGUAGE."
It is a simple approach, but it might work...
It turns out the authors had a similar idea, which they included in the latest revision of their paper. ๐
๐ช Resources
Magpie paper and repository: https://huggingface.co/papers/2406.08464 https://github.com/magpie-align/magpie
Magpie demo by @davanstrien : https://huggingface.co/spaces/davanstrien/magpie
Magpie Ollama Datagen by @mrm8488 : https://github.com/mrm8488/magpie-ollama-datagen
magpie-ultra dataset - massive dataset built with Magpie by Argilla: https://huggingface.co/datasets/argilla/magpie-ultra-v0.1
โ๏ธ distilabel framework - framework for synthetic data generation and AI feedback at scale: https://distilabel.argilla.io/latest/
๐๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐ฐ๐๐ง๐ญ ๐ญ๐จ ๐ ๐๐ง๐๐ซ๐๐ญ๐ ๐๐ง ๐ข๐ง๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ข๐จ๐ง ๐๐๐ญ๐๐ฌ๐๐ญ ๐๐จ๐ซ ๐๐ข๐ง๐-๐ญ๐ฎ๐ง๐ข๐ง๐ ๐ข๐ง ๐ ๐ฅ๐๐ง๐ ๐ฎ๐๐ ๐ ๐จ๐ญ๐ก๐๐ซ ๐ญ๐ก๐๐ง ๐๐ง๐ ๐ฅ๐ข๐ฌ๐ก.
But how do you get started?
I explore how to do this with Magpie in my new article
https://huggingface.co/blog/anakin87/multilingual-magpie
---
๐ฆโโฌ ๐๐ก๐๐ญ ๐ข๐ฌ ๐๐๐ ๐ฉ๐ข๐?
It's a recent technique for creating synthetic instruction datasets.
Magpie is based on a simple but ingenious idea ๐
if you prompt an instruction-tuned model with a pre-query template, you can make it generate a plausible user query/instruction
Here's an example:
model: Llama-3-8B-Instruct
pre-query template: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>"
generated user instruction: "What are some of the responsibilities of a commercial pilot?"
You can then feed this instruction back into the same model to get the assistant response.
By repeating this process, it's possible to generate large synthetic datasets with relatively little effort.
๐ช The authors demonstrate that using these datasets for Supervised Fine Tuning (SFT) can yield strong performance, even competitive with the original instruct model.
๐ง๐๐๐ง๐๐ซ๐๐ญ๐ข๐ง๐ ๐ง๐จ๐ง-๐๐ง๐ ๐ฅ๐ข๐ฌ๐ก ๐๐๐ญ๐
Most Language Models are primarily trained on English texts, so they tend to produce data in English.
How can we overcome this?
Earlier approaches were complex or costly.
Then @mrm8488 found a simple solution: add the target language to the pre-query template.
For Spanish, the template becomes "<|begin_of_text|><|start_header_id|>user<|end_header_id|>spanish:".
This method works for Spanish and German!
โ Unfortunately, it does not work well for other languages (๐ฎ๐น, ๐ณ๐ฑ, ...)
๐
I was excited to explore Llama 3.2, but as a simple ๐ช๐บ EU guy, I don't have access to Meta's multimodal models ๐ฟ
๐ค So I thought: why not challenge the small 3B text model with Agentic RAG?
๐ฏ The plan:
- Build a system that tries to answer questions using a knowledge base.
- If the documents don't contain the answer, use Web search for additional context.
Check out my experimental notebook here: ๐ https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/llama32_agentic_rag.ipynb
My stack:
๐๏ธ haystack (https://haystack.deepset.ai/): open-source LLM orchestration framework
๐ฆ meta-llama/Llama-3.2-3B-Instruct
๐ฆ๐ free DuckDuckGo API, integrated with Haystack
โจ ๐๐ฉ๐ฆ ๐ณ๐ฆ๐ด๐ถ๐ญ๐ต๐ด? ๐๐ฏ๐ค๐ฐ๐ถ๐ณ๐ข๐จ๐ช๐ฏ๐จ - ๐ข ๐ง๐ฆ๐ธ ๐ฎ๐ฐ๐ฏ๐ต๐ฉ๐ด ๐ข๐จ๐ฐ, ๐ต๐ฉ๐ช๐ด ๐ญ๐ฆ๐ท๐ฆ๐ญ ๐ฐ๐ง ๐ฑ๐ฆ๐ณ๐ง๐ฐ๐ณ๐ฎ๐ข๐ฏ๐ค๐ฆ ๐ง๐ณ๐ฐ๐ฎ ๐ข ๐ด๐ฎ๐ข๐ญ๐ญ ๐ฎ๐ฐ๐ฅ๐ฆ๐ญ ๐ธ๐ฐ๐ถ๐ญ๐ฅ'๐ท๐ฆ ๐ฃ๐ฆ๐ฆ๐ฏ ๐ถ๐ฏ๐ต๐ฉ๐ช๐ฏ๐ฌ๐ข๐ฃ๐ญ๐ฆ!
This probably reflects the impressive IFEval score of the model (comparable to Llama 3.1 8B).
Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
๐ ๐ฃ https://huggingface.co/blog/anakin87/spectrum
---
Looking to fine-tune Language Models efficiently and save on computational resources?
One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.
However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.
What if we could identify the most informative layers and only fine-tune those? ๐ค
This is exactly what Spectrum does! ๐
๐ฌ Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)
๐ฏ Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).
You can then โ๏ธ freeze the rest of the model and focus your ๐๏ธโโ๏ธ training on the chosen layers.
๐ Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...
---
For a practical guide, check out the article above.
https://arxiv.org/abs/2408.16737
The direct implication is that smaller models could be used to create cost-effective synthetic datasets. And on that note, in the Gemma terms of use, Google explicitly claims no rights on outputs generated from those models, which means one is free to synthgen from the Gemma line. Meta's Llama 3 licence forbids synthetic generation of outputs if used to improve other models. Relevant Mistral, Qwen, and Yi models under the Apache 2.0 license are unrestricted for this purpose.
Lately, I've spent some time fine-tuning language models.
Now I am happy to release Phi 3.5 mini ITA: a fine-tuned version of Phi-3.5-mini-instruct to improve performance on the Italian language
๐น Small (3.82 B parameters) but capable model
๐น 128k context length
Chat with it on ๐ค Spaces: anakin87/Phi-3.5-mini-ITA
Model card: anakin87/Phi-3.5-mini-ITA
๐๏ธ Data
Supervised fine-tuning using a good mix of English and Italian data:
- mlabonne/FineTome-100k by @mlabonne
- efederici/capybara-claude-15k-ita by @efederici
๐ Thanks to the authors for the datasets.
๐ฏ Targeted training with Spectrum
I used Spectrum, a relatively new technique for parameter-efficient learning.
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest.
I trained the top 30% of model layers.
๐ Spectrum paper: https://arxiv.org/abs/2406.06623
๐ Vibe check and performance on Italian benchmarks seem encouraging
I created a Capybara-inspired Italian dataset by translating the initial instruction and running it through a pipeline to generate conversations. I used Claude Sonnet for translation and instruction generation, and Opus for generating the answers.
I hope this dataset proves useful for people working on ๐ฎ๐น language models.
โ Open sourcing the dataset here: efederici/capybara-claude-15k-ita
Distributed pipeline execution with Ray, new Magpie tasks, reward models, components for dataset diversity based on sentence embeddings, Argilla 2.0 compatibility and many more features!
Check the new release in GitHub: https://github.com/argilla-io/distilabel
This small revolution includes:
๐ย You can now integrate with the Hugging Face Hub and get started in under five minutes.
๐ชย A single
Dataset
class is now designed to handle multiple tasks.๐งย Itโs 100 times simpler to configure your dataset now with the new SDK!
๐ย The documentation has been revamped to be cleaner and more user-friendly.
๐ย A new feature automates splitting annotation tasks among a team.
โ๏ธย The layout has been made more flexible to accommodate many use cases.
Check out the release highlights for more details: https://github.com/argilla-io/argilla/releases/tag/v2.0.0