KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2, 2024 • 16 • 3
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 53 • 2
Training Diffusion Models with Reinforcement Learning Paper • 2305.13301 • Published May 22, 2023 • 4 • 1