Collections
Discover the best community collections!
Collections including paper arxiv:2402.01306
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 16 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 50 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10