Car9n
's Collections
Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
Paper
•
2406.06525
•
Published
•
66
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper
•
2406.06469
•
Published
•
24
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language
Models
Paper
•
2406.04271
•
Published
•
29
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
37
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective
Navigation via Multi-Agent Collaboration
Paper
•
2406.01014
•
Published
•
31
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with
LLM
Paper
•
2406.02884
•
Published
•
15
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
Paper
•
2406.02900
•
Published
•
11
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper
•
2406.02523
•
Published
•
10
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality
Paper
•
2405.21060
•
Published
•
64
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
•
2405.20204
•
Published
•
35
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Paper
•
2405.20335
•
Published
•
18
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper
•
2406.04692
•
Published
•
55
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
Series
Paper
•
2405.19327
•
Published
•
46
2BP: 2-Stage Backpropagation
Paper
•
2405.18047
•
Published
•
23
Paper
•
2405.18407
•
Published
•
46
Yuan 2.0-M32: Mixture of Experts with Attention Router
Paper
•
2405.17976
•
Published
•
18
An Introduction to Vision-Language Modeling
Paper
•
2405.17247
•
Published
•
87
Transformers Can Do Arithmetic with the Right Embeddings
Paper
•
2405.17399
•
Published
•
52
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
•
2405.17405
•
Published
•
14
Trans-LoRA: towards data-free Transferable Parameter
Efficient Finetuning
Paper
•
2405.17258
•
Published
•
14
LoGAH: Predicting 774-Million-Parameter Transformers using Graph
HyperNetworks with 1/100 Parameters
Paper
•
2405.16287
•
Published
•
10
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision
Models
Paper
•
2405.15574
•
Published
•
53
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal
Models
Paper
•
2405.15738
•
Published
•
43
Stacking Your Transformers: A Closer Look at Model Growth for Efficient
LLM Pre-Training
Paper
•
2405.15319
•
Published
•
25
AutoCoder: Enhancing Code Large Language Model with
AIEV-Instruct
Paper
•
2405.14906
•
Published
•
24
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based
Approach
Paper
•
2405.15613
•
Published
•
13
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed
via Gaussian Splatting
Paper
•
2405.15125
•
Published
•
5
Not All Language Model Features Are Linear
Paper
•
2405.14860
•
Published
•
39
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
•
2405.14333
•
Published
•
37
Dense Connector for MLLMs
Paper
•
2405.13800
•
Published
•
22
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
149
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
•
2405.12981
•
Published
•
28
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
Attribute Control
Paper
•
2405.12970
•
Published
•
22
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
•
2405.12399
•
Published
•
28
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
46
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
34
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
•
2405.12107
•
Published
•
25
LoRA Learns Less and Forgets Less
Paper
•
2405.09673
•
Published
•
87
Many-Shot In-Context Learning in Multimodal Foundation Models
Paper
•
2405.09798
•
Published
•
26
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
101
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in
Long-Horizon Tasks
Paper
•
2408.03615
•
Published
•
31