EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated 25 days ago • 84
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published about 1 month ago • 45
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling Paper • 2411.18664 • Published Nov 27, 2024 • 23
Critique-out-Loud Reward Models Collection Paper: https://arxiv.org/abs/2408.11791 | Code: https://github.com/zankner/CLoud • 7 items • Updated Sep 5, 2024 • 3
Tulu V2.5 Suite Collection A suite of models trained using DPO and PPO across a wide variety (up to 14) of preference datasets. See https://arxiv.org/abs/2406.09279 for more! • 44 items • Updated Nov 27, 2024 • 15
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 2
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 67
Prometheus 2 Collection Quantized versions of Prometheus 2 - an alternative of GPT-4 evaluation when doing fine-grained evaluation of an underlying LLM. • 2 items • Updated May 8, 2024 • 1
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2, 2024 • 119
The Feedback Collection Collection Dataset and Model for "Prometheus: Inducing Fine-grained Evaluation Capability in Language Models" • 6 items • Updated Nov 12, 2023 • 3
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean Paper • 2403.06412 • Published Mar 11, 2024 • 3
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets Paper • 2307.10928 • Published Jul 20, 2023 • 12