-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 33 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 37 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 5 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 187
Collections
Discover the best community collections!
Collections including paper arxiv:2311.12983
-
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published -
Mapping Natural Language Commands to Web Elements
Paper • 1808.09132 • Published
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 232 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 83 -
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Paper • 2410.02884 • Published • 53 -
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
Paper • 2311.07445 • Published
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 255 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 243 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257
-
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Paper • 2406.08587 • Published • 15 -
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper • 2406.09170 • Published • 26 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 33 -
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 25