BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages Paper • 2406.09948 • Published Jun 14, 2024
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Paper • 2412.10424 • Published 27 days ago • 2
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Paper • 2412.10424 • Published 27 days ago • 2
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2, 2024 • 119
Survey of Cultural Awareness in Language Models: Text and Beyond Paper • 2411.00860 • Published Oct 30, 2024 • 23