Reviewer2: Optimizing Review Generation Through Prompt Generation Paper • 2402.10886 • Published Feb 16, 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards Paper • 2404.16767 • Published Apr 25, 2024 • 2
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6, 2024
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2 Viewer • Updated Oct 8, 2024 • 116k • 35 • 1