BigLAM: BigScience Libraries, Archives and Museums

non-profit

AI & ML interests

πŸ€— Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.

Recent Activity

biglam's activity

alielfilali01Β 
posted an update 5 days ago
view post
Post
1673
~75% on the challenging GPQA with only 40M parameters πŸ”₯πŸ₯³

GREAT ACHIEVEMENT ! Or is it ?

This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.

The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.

Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.

What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.

This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.

Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)
  • 1 reply
Β·
davanstrienΒ 
posted an update 8 days ago
view post
Post
2944
πŸ‡ΈπŸ‡° Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community
  • 3 replies
Β·
davanstrienΒ 
posted an update 14 days ago
view post
Post
1671
Introducing FineWeb-C πŸŒπŸŽ“, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c
alielfilali01Β 
posted an update 22 days ago
view post
Post
3383
Unpopular opinion: Open Source takes courage to do !

Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !

Cheers to the heroes here who see this!
Β·
alielfilali01Β 
posted an update 26 days ago
view post
Post
1505
Apparently i forgot to put this here !

Well, this is a bit late but consider given our recent blog a read if you are interested in Evaluation.

You don't have to be into Arabic NLP in order to read it, the main contribution we are introducing is a new evaluation measure for NLG. We made the fisrt application of this measure on Arabic for now and we will be working with colleagues from the community to expand it to other languages.

Blog:
Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard
https://huggingface.co/blog/leaderboard-3c3h-aragen

Space:
inceptionai/AraGen-Leaderboard

Give it a read and let me know your thoughts πŸ€—
stefan-itΒ 
posted an update 26 days ago
view post
Post
1183
My latest project is the outcome of the last 2+ years working with TPUs from the amazing TPU Research Cloud (TRC) program and training Encoder-only LMs with the TensorFlow Model Garden library.

πŸ‘‰ Link: https://github.com/stefan-it/model-garden-lms

An overview of some features:

- Cheatsheet for setting-up a TPU VM Pod (with all necessary dependencies) to pretrain LMs with TF Model Garden
- Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models
- Supported architectures include BERT, BERT with Token Dropping and TEAMS

I also released BERT-based models pretrained on the great Hugging Face FineWeb and FineWeb-Edu datasets (10BT subset). With more to come!

πŸ‘‰ Model Hub Link: https://huggingface.co/model-garden-lms

If you find these resources useful, please give them a like!

Made from Bavarian Oberland with ❀️ and πŸ₯¨.
christopherΒ 
posted an update 27 days ago
view post
Post
1582
The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot
Β·
christopherΒ 
posted an update 29 days ago
davanstrienΒ 
posted an update about 1 month ago
view post
Post
505
Increasingly, LLMs are becoming very useful for helping scale annotation tasks, i.e. labelling and filtering. When combined with the structured generation, this can be a very scalable way of doing some pre-annotation without requiring a large team of human annotators.

However, there are quite a few cases where it still doesn't work well. This is a nice paper looking at the limitations of LLM as an annotator for Low Resource Languages: On Limitations of LLM as Annotator for Low Resource Languages (2411.17637).

Humans will still have an important role in the loop to help improve models for all languages (and domains).
davanstrienΒ 
posted an update about 1 month ago
view post
Post
2481
First dataset for the new Hugging Face Bluesky community organisation: bluesky-community/one-million-bluesky-posts πŸ¦‹

πŸ“Š 1M public posts from Bluesky's firehose API
πŸ” Includes text, metadata, and language predictions
πŸ”¬ Perfect to experiment with using ML for Bluesky πŸ€—

Excited to see people build more open tools for a more open social media platform!
davanstrienΒ 
posted an update about 1 month ago
view post
Post
1351
The Bluesky AT Protocol unlocks exciting possibilities:
- Building custom feeds using ML
- Creating dashboards for data exploration
- Developing custom models for Bluesky
To gather Bluesky resources on the Hub, I've created a community org: https://huggingface.co/bluesky-community

My first rather modest contribution is a dashboard that shows the number of posts every second. Drinking straight from the firehose API 🚰

bluesky-community/bluesky-posts-over-time
  • 1 reply
Β·
davanstrienΒ 
posted an update about 2 months ago
albertvillanovaΒ 
posted an update about 2 months ago
view post
Post
1443
🚨 How green is your model? 🌱 Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
πŸ‘‰ open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!

🌍 The Comparator calculates COβ‚‚ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... πŸ› οΈ
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
alielfilali01Β 
posted an update about 2 months ago
view post
Post
2185
Unpopular opinion : o1-preview is more stupid than 4o and Qwen2.5-72B-Instruct in extremely underrated !
  • 2 replies
Β·
albertvillanovaΒ 
posted an update about 2 months ago
view post
Post
1530
πŸš€ New feature of the Comparator of the πŸ€— Open LLM Leaderboard: now compare models with their base versions & derivatives (finetunes, adapters, etc.). Perfect for tracking how adjustments affect performance & seeing innovations in action. Dive deeper into the leaderboard!

πŸ› οΈ Here's how to use it:
1. Select your model from the leaderboard.
2. Load its model tree.
3. Choose any base & derived models (adapters, finetunes, merges, quantizations) for comparison.
4. Press Load.
See side-by-side performance metrics instantly!

Ready to dive in? πŸ† Try the πŸ€— Open LLM Leaderboard Comparator now! See how models stack up against their base versions and derivatives to understand fine-tuning and other adjustments. Easier model analysis for better insights! Check it out here: open-llm-leaderboard/comparator 🌐
davanstrienΒ 
posted an update 2 months ago
albertvillanovaΒ 
posted an update 2 months ago
view post
Post
3134
πŸš€ Exciting update! You can now compare multiple models side-by-side with the Hugging Face Open LLM Comparator! πŸ“Š

open-llm-leaderboard/comparator

Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?