-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05258
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 126 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 51 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 14 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 69 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 63 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 30
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 80 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 36 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 12 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 16 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 27
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 49 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 23 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 24
-
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Paper • 2312.05605 • Published • 2 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 38 -
Rethinking Patch Dependence for Masked Autoencoders
Paper • 2401.14391 • Published • 23 -
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper • 2401.14404 • Published • 17
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 37 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 84 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83