REBEL: Reinforcement Learning via Regressing Relative Reward - a Cornell-AGI Collection

Cornell-AGI 's Collections

Regressing the Relative Future: Efficient Policy Optimizatio

REBEL: Reinforcement Learning via Regressing Relative Reward

REBEL: Reinforcement Learning via Regressing Relative Reward

updated Sep 2, 2024

REBEL: Reinforcement Learning via Regressing Relative Rewards

Paper • 2404.16767 • Published Apr 25, 2024 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_1

Updated Sep 2, 2024 • 6 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_2

Updated Sep 2, 2024 • 8 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_3

Updated Sep 2, 2024 • 4 • 2
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_1

Viewer • Updated Sep 2, 2024 • 56.1k • 41
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_2

Viewer • Updated Sep 2, 2024 • 55.1k • 29
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_3

Viewer • Updated Sep 2, 2024 • 44.6k • 29 • 1
Cornell-AGI/REBEL-Llama-3

Text Generation • Updated Sep 1, 2024 • 25 • 1
Cornell-AGI/REBEL-Llama-3-epoch_2

Text Generation • Updated Sep 1, 2024 • 22 • 3
Cornell-AGI/REBEL-OpenChat-3.5

Text Generation • Updated Sep 1, 2024 • 20 • 1