KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2, 2024 • 16 • 3
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 53 • 2
Training Diffusion Models with Reinforcement Learning Paper • 2305.13301 • Published May 22, 2023 • 4
Training Diffusion Models with Reinforcement Learning Paper • 2305.13301 • Published May 22, 2023 • 4 • 1
DeepNLP/Human-Preferences-Alignment-KTO-Dataset-AI-Services-Genuine-User-Reviews Viewer • Updated Oct 29, 2024 • 60 • 65 • 2