6 129 175

Inui

Norm

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

updated a collection 2 days ago

Fundamental Research

upvoted a paper 2 days ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

upvoted a paper 3 days ago

RepVideo: Rethinking Cross-Layer Representation for Video Generation

View all activity

Organizations

Norm's activity

upvoted a paper 2 days ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published 3 days ago • 52

upvoted a paper 3 days ago

RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published 4 days ago • 13

upvoted 2 papers 4 days ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published 6 days ago • 80

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 5 days ago • 259

upvoted a paper 5 days ago

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 9 days ago • 56

upvoted a paper 11 days ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published 12 days ago • 40

upvoted a collection 11 days ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated 2 days ago • 235

upvoted a paper 25 days ago

Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published 27 days ago • 24

upvoted a paper 28 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 128

upvoted 3 papers about 1 month ago

upvoted 7 papers about 2 months ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 124

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 29

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Paper • 2411.11922 • Published Nov 18, 2024 • 18

OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published Nov 22, 2024 • 55

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 58

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

upvoted a paper 2 months ago

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Paper • 2411.07461 • Published Nov 12, 2024 • 22