A newer version of the Streamlit SDK is available:
1.41.1
π FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
π Paper | π» GitHub | | π¦ X | π¬ Discussion | βοΈ Version: V1 | # Models: {model_num} | Updated: 10/26/2024