hitchhiker3010
's Collections
to_read
updated
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper
•
2401.02823
•
Published
•
35
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
•
2401.02038
•
Published
•
62
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
181
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
•
2309.01131
•
Published
•
1
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
•
2309.10952
•
Published
•
65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
126
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal
Large Language Models
Paper
•
2403.13447
•
Published
•
18
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
120
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
38
Unifying Vision, Text, and Layout for Universal Document Processing
Paper
•
2212.02623
•
Published
•
10
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Paper
•
2406.15334
•
Published
•
9
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in
Large Language Models Using Only Attention Maps
Paper
•
2407.07071
•
Published
•
12
Transformer Layers as Painters
Paper
•
2407.09298
•
Published
•
14
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity
Text Embeddings Through Self-Knowledge Distillation
Paper
•
2402.03216
•
Published
•
5
Visual Text Generation in the Wild
Paper
•
2407.14138
•
Published
•
9
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document
Understanding
Paper
•
2407.12594
•
Published
•
19
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
•
2408.06292
•
Published
•
118
Building and better understanding vision-language models: insights and
future directions
Paper
•
2408.12637
•
Published
•
124
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
•
2408.14906
•
Published
•
139
Becoming self-instruct: introducing early stopping criteria for minimal
instruct tuning
Paper
•
2307.03692
•
Published
•
25
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
Resolution
Paper
•
2307.06304
•
Published
•
29
Contrastive Localized Language-Image Pre-Training
Paper
•
2410.02746
•
Published
•
34
Interpreting and Editing Vision-Language Representations to Mitigate
Hallucinations
Paper
•
2410.02762
•
Published
•
9
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference
Acceleration
Paper
•
2410.02367
•
Published
•
47
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Paper
•
2410.01731
•
Published
•
16
Contextual Document Embeddings
Paper
•
2410.02525
•
Published
•
19
Compact Language Models via Pruning and Knowledge Distillation
Paper
•
2407.14679
•
Published
•
39
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
•
2408.11796
•
Published
•
58
Code Generation with AlphaCodium: From Prompt Engineering to Flow
Engineering
Paper
•
2401.08500
•
Published
•
5
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Paper
•
2305.03495
•
Published
•
1
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper
•
2312.10003
•
Published
•
37
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
Contrastive Loss
Paper
•
2410.17243
•
Published
•
89
Personalization of Large Language Models: A Survey
Paper
•
2411.00027
•
Published
•
31
Adapting While Learning: Grounding LLMs for Scientific Problems with
Intelligent Tool Usage Adaptation
Paper
•
2411.00412
•
Published
•
9
Human-inspired Perspectives: A Survey on AI Long-term Memory
Paper
•
2411.00489
•
Published
•
1
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
113
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
16
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
•
2409.17146
•
Published
•
106
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper
•
2406.20094
•
Published
•
98
DimensionX: Create Any 3D and 4D Scenes from a Single Image with
Controllable Video Diffusion
Paper
•
2411.04928
•
Published
•
49
Add-it: Training-Free Object Insertion in Images With Pretrained
Diffusion Models
Paper
•
2411.07232
•
Published
•
63
MagicQuill: An Intelligent Interactive Image Editing System
Paper
•
2411.09703
•
Published
•
64
Distilling System 2 into System 1
Paper
•
2407.06023
•
Published
•
3
Altogether: Image Captioning via Re-aligning Alt-text
Paper
•
2410.17251
•
Published
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue
Agents
Paper
•
2409.15594
•
Published
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper
•
2411.14402
•
Published
•
43
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper
•
2412.03555
•
Published
•
124
Florence-VL: Enhancing Vision-Language Models with Generative Vision
Encoder and Depth-Breadth Fusion
Paper
•
2412.04424
•
Published
•
59
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
•
2411.17465
•
Published
•
78
OmniDocBench: Benchmarking Diverse PDF Document Parsing with
Comprehensive Annotations
Paper
•
2412.07626
•
Published
•
22
Paper
•
2412.08905
•
Published
•
103
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
Small LMs
Paper
•
2410.18779
•
Published
•
1
Asynchronous LLM Function Calling
Paper
•
2412.07017
•
Published
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
124
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
50
Paper
•
2412.13501
•
Published
•
24
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
•
2412.13303
•
Published
•
13
Alignment faking in large language models
Paper
•
2412.14093
•
Published
•
7
Assisting in Writing Wikipedia-like Articles From Scratch with Large
Language Models
Paper
•
2402.14207
•
Published
•
2
Into the Unknown Unknowns: Engaged Human Learning through Participation
in Language Model Agent Conversations
Paper
•
2408.15232
•
Published
Proactive Agents for Multi-Turn Text-to-Image Generation Under
Uncertainty
Paper
•
2412.06771
•
Published
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
139
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
232
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
•
2501.04227
•
Published
•
77
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
83
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
•
2501.05707
•
Published
•
18
Titans: Learning to Memorize at Test Time
Paper
•
2501.00663
•
Published
•
12
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
74
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of
Images and Videos
Paper
•
2501.04001
•
Published
•
40
FramePainter: Endowing Interactive Image Editing with Video Diffusion
Priors
Paper
•
2501.08225
•
Published
•
17
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM
Outputs with Human Preferences
Paper
•
2404.12272
•
Published
•
1
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
18
AnyStory: Towards Unified Single and Multiple Subject Personalization in
Text-to-Image Generation
Paper
•
2501.09503
•
Published
•
8