-
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22 -
Scaling Laws of RoPE-based Extrapolation
Paper • 2310.05209 • Published • 7 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 38 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14619
-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 22 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 114 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 45
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 82 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 13
-
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 66 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 125 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 183
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 90 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 69 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 48
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 10 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 64 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 10
-
Attention Is All You Need
Paper • 1706.03762 • Published • 50 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 18 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 69 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 14
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47