Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators Paper • 2403.16950 • Published Mar 25, 2024 • 4
HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants Paper • 2405.09186 • Published May 15, 2024 • 22
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published Jun 28, 2024 • 59