Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7, 2024 • 29
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Paper • 2402.04858 • Published Feb 7, 2024 • 14