Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 6 days ago • 35
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 2
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 2 • 2
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning Paper • 2211.03044 • Published Nov 6, 2022 • 1
Optimizing Language Model's Reasoning Abilities with Weak Supervision Paper • 2405.04086 • Published May 7, 2024 • 1
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 2
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning Paper • 2410.10074 • Published Oct 14, 2024