Villekom
's Collections
rlhf/finetune
updated
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Paper
•
2402.12366
•
Published
•
3
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
•
2403.10704
•
Published
•
58
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
184
Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation
Paper
•
2401.08417
•
Published
•
34
Insights into Alignment: Evaluating DPO and its Variants Across Multiple
Tasks
Paper
•
2404.14723
•
Published
•
10
Self-Play Preference Optimization for Language Model Alignment
Paper
•
2405.00675
•
Published
•
27
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
•
2406.00888
•
Published
•
31
Iterative Length-Regularized Direct Preference Optimization: A Case
Study on Improving 7B Language Models to GPT-4 Level
Paper
•
2406.11817
•
Published
•
13
Following Length Constraints in Instructions
Paper
•
2406.17744
•
Published
•
1
Understanding the performance gap between online and offline alignment
algorithms
Paper
•
2405.08448
•
Published
•
17
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
30
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
•
2310.13639
•
Published
•
24
Paper
•
2408.02666
•
Published
•
28
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
136