Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 6 days ago • 35
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 2