Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30, 2024 • 7 • 1