-
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
Paper • 2406.10996 • Published • 33 -
Simulating Classroom Education with LLM-Empowered Agents
Paper • 2406.19226 • Published • 31 -
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Paper • 2406.19389 • Published • 53 -
LAMBDA: A Large Model Based Data Agent
Paper • 2407.17535 • Published • 35
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05258
-
LLoCO: Learning Long Contexts Offline
Paper • 2404.07979 • Published • 21 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 115 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 17 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 20
-
Depth Anything V2
Paper • 2406.09414 • Published • 96 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 35 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 112
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 20 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 29 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 120
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 145 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 77 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 30
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 27 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 13 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 47 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 29
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 65 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 44 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 33 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 139
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 89 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 65 -
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 60
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 36 -
Personalized Visual Instruction Tuning
Paper • 2410.07113 • Published • 70 -
Differential Transformer
Paper • 2410.05258 • Published • 169 -
What Matters in Transformers? Not All Attention is Needed
Paper • 2406.15786 • Published • 30
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 79 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 92 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 61