First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! 🤯

Try it out yourself! 👇
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization

liked a Space 7 days ago

Running

🔥

Attention Visualization

Vision Transformer Attention Visualization

reacted to DawnC's post with ❤️ 7 days ago

Post

1400

🌟 PawMatchAI: Making Breed Selection More Intuitive! 🐕
Excited to share the latest update to this AI-powered companion for finding your perfect furry friend! The breed recommendation system just got a visual upgrade to help you make better decisions.

✨ What's New?
Enhanced breed recognition accuracy through strategic model improvements:
- Upgraded to a fine-tuned ConvNeXt architecture for superior feature extraction
- Implemented progressive layer unfreezing during training
- Optimized data augmentation pipeline for better generalization
- Achieved 8% improvement in breed classification accuracy

🎯 Key Features:
- Smart breed recognition powered by AI
- Visual matching scores with intuitive color indicators
- Detailed breed comparisons with interactive tooltips
- Lifestyle-based recommendations tailored to your needs

💭 Project Vision
Combining my passion for AI and pets, this project represents another step toward my goal of creating meaningful AI applications. Each update aims to make the breed selection process more accessible while improving the underlying technology.

👉 Try it now: DawnC/PawMatchAI

Your likes ❤️ on this space fuel this project's growth!

#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision
See translation

upvoted a paper 20 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 21 days ago • 120

updated a model 23 days ago

joseph-bou/EXAONE-3.5-32B-Instruct-Q6_K-GGUF

Text Generation • Updated 23 days ago • 28

liked a Space 23 days ago

Running on A10G

1.08k

🦙

GGUF My Repo

reacted to rwightman's post with 👍 23 days ago

Post

1323

I'm currently on a push to expand the scope of image based datasets on the Hub. There's certainly a lot already, but for anyone who's looked closely, there's not a whole lot of standardization. I am to fix that, datasets under the https://huggingface.co/timm and https://huggingface.co/pixparse orgs will serve as canonical examples for various task / modality combinations and be useable without fuss in libraries like timm, OpenCLIP, and hopefully more.

I just uploaded the first multi-label dataset that I'll support with timm scripts soon: timm/plant-pathology-2021

Next up object detection & segmentation! I've got an annotation spec sorted out, a lot of datasets ready to rip, and yeah that means timm support for object detection, eventually segmentation, is finally under development :O

upvoted a paper 24 days ago

EMOv2: Pushing 5M Vision Model Frontier

Paper • 2412.06674 • Published 30 days ago • 13

liked a dataset 28 days ago

shawshankvkt/Walking_Tours

Viewer • Updated Jan 28, 2024 • 3 • 166 • 8

reacted to merve's post with ❤️ about 1 month ago

Post

5583

This week in open-source AI was insane 🤠 A small recap🕺🏻 merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal 🖼️
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants 👏
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs 💬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license 🔥
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! 🔥 nearly 8TB pretraining data in many languages!

Image/Video Generation 🖼️
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio 🔊
> Indic-Parler-TTS is a new text2speech model made by community

upvoted a paper about 1 month ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 121

liked a model about 1 month ago

rwightman/timm-optim-caution

Updated Dec 6, 2024 • 8

reacted to rwightman's post with 🔥🚀 about 1 month ago

Post

1365

There's a new timm release, v 1.0.12, with a focus on optimizers. The optimizer factory has been refactored, there's now a timm.optim.list_optimizers() and new way to register optimizers and their attributes. As always you can use an timm optimizer like a torch one, just replace torch.optim with timm.optim

New optimizers include:
* AdafactorBigVision - adfactorbv
* ADOPT - adopt / adoptw (decoupled decay)
* MARS - mars
* LaProp - laprop
* Cautious Optimizers - a modification to all of the above, prefix with c as well as cadamw, cnadamw, csgdw, clamb, crmsproptf

I shared some caution comparisons in this model repo: rwightman/timm-optim-caution

For details, references, see the code: https://github.com/huggingface/pytorch-image-models/tree/main/timm/optim

3 replies