5 140 2

Bhimraj Yadav PRO

bhimrazy

https://bhimraj.com.np

AI & ML interests

Computer Vision, Healthcare, Generative AI and NLP

Recent Activity

upvoted a paper about 12 hours ago

MLLM-as-a-Judge for Image Safety without Human Labeling

upvoted a paper about 12 hours ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

upvoted a paper 1 day ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

View all activity

Organizations

bhimrazy's activity

upvoted 2 papers about 12 hours ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published 4 days ago • 14

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 2 days ago • 47

upvoted 2 papers 1 day ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 11 days ago • 59

MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes

Paper • 2412.19260 • Published 9 days ago • 1

upvoted a paper 2 days ago

1.58-bit FLUX

Paper • 2412.18653 • Published 10 days ago • 63

upvoted a paper 5 days ago

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published 10 days ago • 82

upvoted 2 papers 6 days ago

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Paper • 2412.18609 • Published 10 days ago • 13

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

Paper • 2412.18176 • Published 11 days ago • 15

upvoted 3 papers 10 days ago

upvoted 3 papers 15 days ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 19 days ago • 41

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 17 days ago • 116

Alignment faking in large language models

Paper • 2412.14093 • Published 17 days ago • 7

upvoted a paper 17 days ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 22 days ago • 80

upvoted 5 papers 18 days ago

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published 23 days ago • 43

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Paper • 2412.08737 • Published 23 days ago • 52

Phi-4 Technical Report

Paper • 2412.08905 • Published 23 days ago • 95

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 22 days ago • 92

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Paper • 2412.07769 • Published 24 days ago • 26