Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Paper • 2407.12854 • Published Jul 9, 2024 • 29
SmartPlay : A Benchmark for LLMs as Intelligent Agents Paper • 2310.01557 • Published Oct 2, 2023 • 12
Language models scale reliably with over-training and on downstream tasks Paper • 2403.08540 • Published Mar 13, 2024 • 14
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7, 2024 • 38
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning Paper • 2402.11690 • Published Feb 18, 2024 • 8
S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper • 2311.03285 • Published Nov 6, 2023 • 28
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Paper • 2308.06595 • Published Aug 12, 2023 • 5
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper • 2306.05685 • Published Jun 9, 2023 • 32
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents Paper • 2305.02412 • Published May 3, 2023 • 1
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning Paper • 2305.15486 • Published May 24, 2023 • 1