UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity Paper • 2409.04081 • Published Sep 6, 2024 • 3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 44
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15, 2024 • 45
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 18 items • Updated 5 days ago • 38