Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 26 days ago • 294
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16, 2024 • 44
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4, 2024 • 49
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 70
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4, 2024 • 37
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4, 2024 • 63
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2, 2024 • 54
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 182
Unicron: Economizing Self-Healing LLM Training at Scale Paper • 2401.00134 • Published Dec 30, 2023 • 11
GeoGalactica: A Scientific Large Language Model in Geoscience Paper • 2401.00434 • Published Dec 31, 2023 • 10
Boosting Large Language Model for Speech Synthesis: An Empirical Study Paper • 2401.00246 • Published Dec 30, 2023 • 13
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 80
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models Paper • 2401.00788 • Published Jan 1, 2024 • 22
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 29
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 Paper • 2312.16171 • Published Dec 26, 2023 • 35
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 57
Time is Encoded in the Weights of Finetuned Language Models Paper • 2312.13401 • Published Dec 20, 2023 • 20