Should I create an organization tackling the AI--human alignment problem. Finding the humans that care about other humans most and basically pretraining with their stuff.. I already did some experiments and it seems to work well.
A collection of 39,280 video clips metadata from GoodGame.ru streaming platform featuring:
- Complete clip information including direct video URLs and thumbnails - Streamer details like usernames and avatars - Engagement metrics such as view counts - Game categories and content classifications - Released under Creative Commons Zero (CC0) license
This extensive clips collection provides a valuable resource for developing and evaluating video-based AI applications, especially in Russian gaming and streaming contexts.
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!
The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:
๐ Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512ร512 pixels with 14ร14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage
โก๏ธ Under The Hood: - Multi-stage training process with progressive resolution scaling (224โ384โ512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample
๐ Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D)
๐ฏ Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!
Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
reacted to nicolay-r's
post with ๐๐ง 8 days ago
๐ข So far I noticed that ๐ง reasoning with llm ๐ค in English is tend to be more accurate than in other languages. However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid: 1.๐ด Third-party framework installation 2.๐ด Text chunking 3.๐ด support of meta-annotation like spans / objects / etc.
๐ To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. ๐
bulk-translate is a tiny Python ๐ no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.
It supports ๐จโ๐ป API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python ๐ I make it accessible as much as possible for RAG and / or LLM-powered app downstreams: ๐ https://github.com/nicolay-r/bulk-translate/wiki
All you have to do is to provide iterator of texts, where each text: 1. โ String object 2. โ List of strings and nested lists that represent spans (value + any ID data).
* 4 new video models * Multiple image models, including SANA & Flux Control * New quantizers -> GGUF & TorchAO * New training scripts Enjoy this holiday-special Diffusers release ๐ค Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
๐จ GiniGen Canvas-o3: Intelligent AI-Powered Image Editing Platform Transform your images with precision using our next-generation tool that lets you extract anything from text to objects with simple natural language commands! ๐ ๐ Key Differentiators:
Intelligent Object Recognition & Extraction โข Freedom to select any target (text, logos, objects) โข Simple extraction via natural language commands ("dog", "signboard", "text") โข Ultra-precise segmentation powered by GroundingDINO + SAM Advanced Background Processing โข AI-generated custom backgrounds for extracted objects โข Intuitive object size/position adjustment โข Multiple aspect ratio support (1:1, 16:9, 9:16, 4:3) Progressive Text Integration โข Dual text placement: over or behind images โข Multi-language font support โข Real-time font style/size/color/opacity adjustment
๐ฏ Use Cases:
Extract logos from product images Isolate text from signboards Select specific objects from scenes Combine extracted objects with new backgrounds Layer text in front of or behind images
User Simplicity: Natural language commands for object extraction High Precision: AI-powered accurate object recognition Versatility: From basic editing to advanced content creation Real-Time Processing: Instant result visualization
Experience the new paradigm of image editing with GiniGen Canvas-o3:
Seamless integration of multiple editing functions Professional-grade results with consumer-grade ease Perfect for social media, e-commerce, and design professionals
Whether you're extracting text from complex backgrounds or creating sophisticated visual content, GiniGen Canvas-o3 provides the precision and flexibility you need for modern image editing!
Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems!
Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable:
>> Technical Deep Dive
Architecture Overview โข The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context โข Sparse features are processed using learnable embeddings with size based on feature cardinality โข User sequence embeddings are generated using a transformer architecture processing past engagements
Feature Processing Pipeline โข Dense features undergo normalization for numerical stability โข Sparse and embedding features receive L2 normalization โข All features are concatenated into a single feature embedding
Key Innovations โข Implemented parallel MaskNet layers with 3 blocks โข Used projection ratio of 2.0 and output dimension of 512 โข Stacked 4 DCNv2 layers on top for higher-order interactions
Performance Improvements โข Achieved +1.42% increase in Homefeed Save Volume โข Boosted Overall Time Spent by +0.39% โข Maintained memory consumption increase to just 5%
>> Industry Constraints Addressed
Memory Management โข Optimized for 60% GPU memory utilization โข Prevented OOM errors while maintaining batch size efficiency
Latency Optimization โข Removed input-output concatenation before MLP โข Reduced hidden layer sizes in MLP โข Achieved zero latency increase while improving performance
System Stability โข Ensured reproducible results across retraining โข Maintained model stability across different data distributions โข Successfully deployed in production environment
This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!
reacted to Kseniase's
post with โค๏ธ๐8 days ago
This year, we started our โAI Agents and Agentic Workflowsโ series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them. The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs โ we will be covering that for you. Now, letโs review the agentic systems that have emerged this year.
Here is a list of 15 agentic systems and frameworks of 2024: