Will Brooks

TornButter

AI & ML interests

None yet

Recent Activity

liked a model 4 days ago
deepseek-ai/DeepSeek-V3
liked a model 11 days ago
Qwen/QVQ-72B-Preview
liked a model 11 days ago
answerdotai/ModernBERT-base
View all activity

Organizations

None yet

TornButter's activity

reacted to singhsidhukuldeep's post with šŸ”„ 13 days ago
view post
Post
2173
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

šŸš€ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512Ɨ512 pixels with 14Ɨ14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

āš”ļø Under The Hood:
- Multi-stage training process with progressive resolution scaling (224ā†’384ā†’512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

šŸ“Š Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

šŸŽÆ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
liked a Space about 1 month ago