Collections
Discover the best community collections!
Collections including paper arxiv:2307.04964
-
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Paper • 2401.01967 • Published -
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 28 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 123 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 65
-
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 11 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 48 -
SuperHF: Supervised Iterative Learning from Human Feedback
Paper • 2310.16763 • Published • 1 -
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Paper • 2311.15657 • Published • 2
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 30 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 4 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 3 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 16
-
Deep reinforcement learning from human preferences
Paper • 1706.03741 • Published • 3 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 16 -
Direct Preference-based Policy Optimization without Reward Modeling
Paper • 2301.12842 • Published -
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 16
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 3 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 48
-
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Paper • 2310.00212 • Published • 2 -
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 9 -
Aligning Language Models with Offline Reinforcement Learning from Human Feedback
Paper • 2308.12050 • Published • 1 -
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 28