xxyyy123
's Collections
Align
updated
Internal Consistency and Self-Feedback in Large Language Models: A
Survey
Paper
•
2407.14507
•
Published
•
46
New Desiderata for Direct Preference Optimization
Paper
•
2407.09072
•
Published
•
10
Self-Recognition in Language Models
Paper
•
2407.06946
•
Published
•
24
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for
Text-to-Image Generation?
Paper
•
2407.04842
•
Published
•
53
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical
Reasoning
Paper
•
2407.00782
•
Published
•
23
Direct Preference Knowledge Distillation for Large Language Models
Paper
•
2406.19774
•
Published
•
22
Iterative Nash Policy Optimization: Aligning LLMs with General
Preferences via No-Regret Learning
Paper
•
2407.00617
•
Published
•
7
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
Paper
•
2406.18629
•
Published
•
41
Aligning Teacher with Student Preferences for Tailored Training Data
Generation
Paper
•
2406.19227
•
Published
•
24
Can LLMs Learn by Teaching? A Preliminary Study
Paper
•
2406.14629
•
Published
•
19
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
Paper
•
2406.18790
•
Published
•
33
On the Transformations across Reward Model, Parameter Update, and
In-Context Prompt
Paper
•
2406.16377
•
Published
•
11
DreamBench++: A Human-Aligned Benchmark for Personalized Image
Generation
Paper
•
2406.16855
•
Published
•
54
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
Paper
•
2406.16772
•
Published
•
2
Judging the Judges: Evaluating Alignment and Vulnerabilities in
LLMs-as-Judges
Paper
•
2406.12624
•
Published
•
36
Bootstrapping Language Models with DPO Implicit Rewards
Paper
•
2406.09760
•
Published
•
38
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Paper
•
2407.02477
•
Published
•
21
Self-Training with Direct Preference Optimization Improves
Chain-of-Thought Reasoning
Paper
•
2407.18248
•
Published
•
32