-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 136 -
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
Paper • 2409.18943 • Published • 28 -
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Paper • 2411.16594 • Published • 37 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38
Yuan
MinakamiYuki
AI & ML interests
None yet
Recent Activity
liked
a model
1 day ago
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
liked
a model
12 days ago
sail/Sailor2-8B-Chat
updated
a collection
24 days ago
LLM paper
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet