Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper โข 2402.14740 โข Published Feb 22, 2024 โข 13