-
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published -
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 12 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2408.15240
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 66 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 58
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 4 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 3 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13 -
Diffusion Policy Policy Optimization
Paper • 2409.00588 • Published • 20
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 54 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 33 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 6 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 68
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 24 -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 33 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47