Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 12
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models By yuchenlin • Jul 27, 2024 • 27
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23, 2024 • 37
What matters when building vision-language models? Paper • 2405.02246 • Published May 3, 2024 • 101
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 47
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 • 170
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 67
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7, 2024 • 27
Aligning to Thousands of Preferences via System Message Generalization Paper • 2405.17977 • Published May 28, 2024 • 7
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models Paper • 2402.01118 • Published Feb 2, 2024 • 29
LangBridge: Multilingual Reasoning Without Multilingual Supervision Paper • 2401.10695 • Published Jan 19, 2024 • 5
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation Paper • 2401.12208 • Published Jan 22, 2024 • 22
Improving fine-grained understanding in image-text pre-training Paper • 2401.09865 • Published Jan 18, 2024 • 16
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation Paper • 2401.06591 • Published Jan 12, 2024 • 3