1 1 57

Mex Ivanov

MexIvanov

MexIvanov

AI & ML interests

NLP, Coding, Quantum Computing and more.

Recent Activity

liked a dataset 14 days ago

DAMO-NLP-SG/multimodal_textbook

liked a model 16 days ago

hexgrad/Kokoro-82M

reacted to m-ric's post with 🚀 23 days ago

Since I published it on GitHub a few days ago, Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯 ➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort! The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use. We will make it work better, and fully open. ✨ Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/

View all activity

Organizations

None yet

MexIvanov's activity

liked a dataset 14 days ago

DAMO-NLP-SG/multimodal_textbook

Updated 20 days ago • 14.3k • 132

liked a model 16 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 2 days ago • 62.8k • 2.62k

reacted to m-ric's post with 🚀🔥 23 days ago

Post

5075

Since I published it on GitHub a few days ago,
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯

➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. ✨

Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/

3 replies

reacted to singhsidhukuldeep's post with 🔥 about 1 month ago

Post

2191

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

reacted to singhsidhukuldeep's post with 🚀 about 1 month ago

Post

3666

Exciting breakthrough in AI: @Meta 's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!

The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:

>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.

Three-Component Architecture:
• Lightweight Local Encoder that converts bytes to patch representations
• Powerful Global Latent Transformer that processes patches
• Local Decoder that converts patches back to bytes

>> Technical Advantages
• Matches performance of Llama 3 at 8B parameters while being more efficient
• Superior handling of non-English languages and rare character sequences
• Remarkable 99.9% accuracy on spelling tasks
• Better scaling properties than token-based models

>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.

This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!

3 replies

liked a model about 1 month ago

CohereForAI/c4ai-command-r7b-12-2024

Text Generation • Updated 3 days ago • 11.3k • 352

reacted to reach-vb's post with 🔥 about 2 months ago

Post

4624

VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! 🔥

reacted to burtenshaw's post with 👍 about 2 months ago

Post

2708

For anyone looking to boost their LLM fine-tuning and alignment skills this decemeber. We're running this free and open course called smol course. It’s not big like Li Yin and @mlabonne , it’s just smol.

👷 It focuses on practical use cases, so if you’re working on something, bring it along.

👯‍♀️ It’s peer reviewed and open so you can discuss and get feedback.

🤘 If you’re already a smol pro, feel free to drop a star or issue.

> > Part 1 starts now, and it’s on instruction tuning!

https://github.com/huggingface/smol-course

liked a model about 2 months ago

jinaai/jina-embeddings-v3

Feature Extraction • Updated 25 days ago • 1.22M • 699

reacted to julien-c's post with 🔥 2 months ago

Post

3040

wow 😮

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

PrimeIntellect/INTELLECT-1-Instruct

liked a dataset 2 months ago

wikimedia/wikipedia

Viewer • Updated Jan 9, 2024 • 61.6M • 122k • 718

reacted to davidberenstein1957's post with 🔥 2 months ago

Post

1710

Let’s make a generation of amazing image-generation models

The best image generation models are trained on human preference datasets, where annotators have selected the best image from a choice of two. Unfortunately, many of these datasets are closed source so the community cannot train open models on them. Let’s change that!

The community can contribute image preferences for an open-source dataset that could be used for building AI models that convert text to image, like the flux or stable diffusion families. The dataset will be open source so everyone can use it to train models that we can all use.

Blog: https://huggingface.co/blog/burtenshaw/image-preferences

updated 3 models 2 months ago

liked a model 2 months ago

NexaAIDev/OmniVLM-968M

Updated Dec 17, 2024 • 1.51k • 499

reacted to maxiw's post with ❤️ 3 months ago

Post

4646

I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
❤️: 7048 times
🔥: 5921 times
👍: 4856 times
🚀: 2549 times
🤗: 2065 times

10 replies

updated a dataset 3 months ago

MexIvanov/RAG-v1-ruen

Viewer • Updated Nov 11, 2024 • 51.4k • 54

updated a model 3 months ago

MexIvanov/zephyr-python-ru

Text Generation • Updated Nov 11, 2024 • 2