social-post-explorers (Social Post Explorers)

tomaarsen

posted an update 4 days ago

Post

2348

That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!

Details:
🤖 Based on ModernBERT-base with 149M parameters.
📊 Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
🏎️ Immediate FA2 and unpacking support for super efficient inference.
🪆 Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
➡️ Maximum sequence length of 8192 tokens!
2️⃣ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
➕ Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
🏛️ Apache 2.0 licensed: fully commercially permissible

Try it out here: nomic-ai/modernbert-embed-base

Very nice work by Zach Nussbaum and colleagues at Nomic AI.

merve

posted an update 4 days ago

Post

3736

supercharge your LLM apps with smolagents 🔥

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!

Here's our blog for you to get started https://huggingface.co/blog/smolagents

YannisTevissen

posted an update 6 days ago

Post

2132

Starting this collection to gather models, spaces, dataset or even papers related to disability. Feel free to ping me if you see something relevant to add

YannisTevissen/ai-for-disability-67684a1a9966a2e699f6b114

merve

posted an update 10 days ago

Post

4153

QwQ can see 🔥
Qwen team released QvQ, a large vision LM with reasoning 😱

it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo!
Check them out ⬇️
Demo Qwen/QVQ-72B-preview
Model Qwen/QVQ-72B-Preview
Read more https://qwenlm.github.io/blog/qvq-72b-preview/
Congratulations @JustinLin610 and team!

2 replies

·

JustinLin610

authored a paper 15 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 15 days ago • 334

KnutJaegersberg

posted an update 15 days ago

Post

1301

Intelligence Potentiation: An Evolutionary Perspective on AI Agent Designs

I found it useful to think of AI agent design as progressing up a ladder, through evolutionary selection.

https://huggingface.co/blog/KnutJaegersberg/intelligence-potentiation

fdaudens

posted an update 15 days ago

Post

1244

🔍 From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.

Check it out: huggingface/open-source-ai-year-in-review-2024

Lewdiculous

posted an update 15 days ago

Post

2612

Hello fellow LLMers, just a quick notice that some of my activity will be moved into the AetherArchitectural Commuity and split with @Aetherarchio .

[here] https://huggingface.co/AetherArchitectural

All activity should be visible in the left side of my profile.

1 reply

·

tomaarsen

authored a paper 15 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 17 days ago • 116

fdaudens

posted an update 17 days ago

Post

1207

🤝 Want to share your AI models while protecting your work? Licenses are key!

Fascinating to see that nearly 60% of models on the Hub use Apache & MIT licenses.

Explore the viz here: huggingface/open-source-ai-year-in-review-2024

merve

posted an update 17 days ago

Post

2731

Aya by Cohere For AI can now see! 👀

C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! 🗣️

The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo

Dataset maya-multimodal/pretrain

Model maya-multimodal/maya 👏
kudos @nahidalam and team

1 reply

·

fdaudens

posted an update 17 days ago

Post

1301

Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?

I used natural language processing to cluster and map them — really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.

Click any dot to explore the original prediction. What themes surprise/interest you the most?

👉 fdaudens/nieman_lab_2025_predictions_visualization

P.s.: I discovered that Nieman Lab's content is under Creative Commons license!

merve

posted an update 17 days ago

Post

3159

Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧶

✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench

The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️

Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful 🔥
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield

They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥

6 replies

·

fdaudens

posted an update 20 days ago

Post

665

The #NeurIPS2024 Class: Explore which are the leading research institutions 🎓🔬

huggingface/open-source-ai-year-in-review-2024

merve

posted an update 23 days ago

Post

1739

A complete RAG pipeline includes a reranker, which ranks the documents to find the best document 📓
Same goes for multimodal RAG, multimodal rerankers which we can integrate to multimodal RAG pipelines!
Learn how to build a complete multimodal RAG pipeline with vidore/colqwen2-v1.0 as retriever, lightonai/MonoQwen2-VL-v0.1 as reranker, Qwen/Qwen2-VL-7B-Instruct as VLM in this notebook that runs on a GPU as small as L4 🔥 https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_reranker_and_vlms

fdaudens

posted an update 23 days ago

Post

1538

Are you at #NeurIPS2024? Check out our cool data visualizations about research papers in the Year in Review!

huggingface/open-source-ai-year-in-review-2024

huggingface/open-source-ai-year-in-review-2024

eienmojiki

posted an update 23 days ago

Post

1438

👀 Introducing 2048 Game API: A RESTful API for the Classic Puzzle Game 🧩

I'm excited to share my latest project, 2048 Game API, a RESTful API that allows you to create, manage, and play games of 2048, a popular puzzle game where players slide numbered tiles to combine them and reach the goal of getting a tile with the value of 2048.

⭐ Features
Create new games with customizable board sizes (3-8)
Make moves (up, down, left, right) and get the updated game state
Get the current game state, including the board, score, and game over status
Delete games
Generate images of the game board with customizable themes (light and dark)

🔗 API Endpoints
POST /api/games - Create a new game
GET /api/games/:gameId - Get the current game state
POST /api/games/:gameId/move - Make a move (up, down, left, right)
DELETE /api/games/:gameId - Delete a game
GET /api/games/:gameId/image - Generate an image of the game board

🧩 Example Use Cases
- Create a new game with a 4x4 board:

curl -X POST -H "Content-Type: application/json" -d '{"size": 4}' http://localhost:3000/api/games

- Make a move up:

curl -X POST -H "Content-Type: application/json" -d '{"direction": "up"}' http://localhost:3000/api/games/:gameId/move

- Get the current game state:

curl -X GET http://localhost:3000/api/games/:gameId

💕 Try it out!
- Demo: eienmojiki/2048
- Source: https://github.com/kogakisaki/koga-2048
- You can try out the API by running the server locally or using a tool like Postman to send requests to the API. I hope you enjoy playing 2048 with this API!

Let me know if you have any questions or feedback!

🐧 Mouse1 is our friend🐧

JustinLin610

authored a paper 24 days ago

Evaluating and Aligning CodeLLMs on Human Preference

Paper • 2412.05210 • Published 28 days ago • 47

JustinLin610

authored a paper 25 days ago

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published 26 days ago • 72

stefan-it

posted an update 26 days ago

Post

1183

My latest project is the outcome of the last 2+ years working with TPUs from the amazing TPU Research Cloud (TRC) program and training Encoder-only LMs with the TensorFlow Model Garden library.

👉 Link: https://github.com/stefan-it/model-garden-lms

An overview of some features:

- Cheatsheet for setting-up a TPU VM Pod (with all necessary dependencies) to pretrain LMs with TF Model Garden
- Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models
- Supported architectures include BERT, BERT with Token Dropping and TEAMS

I also released BERT-based models pretrained on the great Hugging Face FineWeb and FineWeb-Edu datasets (10BT subset). With more to come!

👉 Model Hub Link: https://huggingface.co/model-garden-lms

If you find these resources useful, please give them a like!

Made from Bavarian Oberland with ❤️ and 🥨.

Social Post Explorers

AI & ML interests

Recent Activity

social-post-explorers's activity

Qwen2.5 Technical Report

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Evaluating and Aligning CodeLLMs on Human Preference

ProcessBench: Identifying Process Errors in Mathematical Reasoning

AI & ML interests

Recent Activity

Team members 866

social-post-explorers's activity