Align - a xxyyy123 Collection

xxyyy123 's Collections

LLM

Align

Dataset

Align

updated Aug 2, 2024

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Paper • 2407.14507 • Published Jul 19, 2024 • 46
New Desiderata for Direct Preference Optimization

Paper • 2407.09072 • Published Jul 12, 2024 • 10
Self-Recognition in Language Models

Paper • 2407.06946 • Published Jul 9, 2024 • 24
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5, 2024 • 53
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Paper • 2407.00782 • Published Jun 30, 2024 • 23
Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 22
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Paper • 2406.18629 • Published Jun 26, 2024 • 41
Aligning Teacher with Student Preferences for Tailored Training Data Generation

Paper • 2406.19227 • Published Jun 27, 2024 • 24
Can LLMs Learn by Teaching? A Preliminary Study

Paper • 2406.14629 • Published Jun 20, 2024 • 19
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data

Paper • 2406.18790 • Published Jun 26, 2024 • 33
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Paper • 2406.16377 • Published Jun 24, 2024 • 11
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24, 2024 • 54
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

Paper • 2406.16772 • Published Jun 24, 2024 • 2
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18, 2024 • 36
Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 38
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 21
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Paper • 2407.18248 • Published Jul 25, 2024 • 32