k-mktr commited on
Commit
5282662
Β·
verified Β·
1 Parent(s): 11c59c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -23
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: πŸ†
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.1.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
@@ -87,36 +87,74 @@ You can customize the arena by modifying the `arena_config.py` file:
87
 
88
  The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
89
 
90
- ### Scoring System
91
 
92
- We use a sophisticated scoring system to rank the models fairly:
93
 
94
- 1. We calculate a score for each model using the formula:
95
- ```
96
- score = win_rate * (1 - 1 / (total_battles + 1))
97
- ```
98
- This formula balances win rate with the number of battles, giving more weight to models that have participated in more battles.
99
 
100
- 2. We sort the results primarily by this new score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
101
 
102
- 3. The leaderboard displays this calculated score alongside wins, losses, and other statistics.
103
 
104
- 4. The ranking is based on this sophisticated score instead of just the number of wins.
 
 
 
 
 
 
105
 
106
- This approach provides a fairer ranking system that considers both performance (win rate) and experience (total battles). Models that maintain a high win rate over many battles will be ranked higher than those with fewer battles or lower win rates.
 
 
 
107
 
108
- ## πŸ€– Models
 
 
 
 
 
 
 
 
 
 
 
109
 
110
- The arena currently supports various compact models, including:
 
 
 
 
 
 
 
 
 
 
111
 
112
- - LLaMA 3.2 (1B and 3B versions)
113
- - LLaMA 3.1 (8B version)
114
- - Gemma 2 (2B and 9B versions)
115
- - Qwen 2.5 (0.5B, 1.5B, 3B, and 7B versions)
116
- - Mistral 0.3 (7B version)
117
- - Phi 3.5 (3.8B version)
118
- - Hermes 3 (8B version)
119
- - Aya 23 (8B version)
 
 
 
 
 
 
 
 
 
 
120
 
121
  ## 🀝 Contributing
122
 
@@ -131,4 +169,4 @@ This project is open-source and available under the MIT License
131
  - Thanks to the Ollama team for providing that amazing tool.
132
  - Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!
133
 
134
- Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! πŸ†
 
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 5.3.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
87
 
88
  The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
89
 
90
+ ### Main Leaderboard Scoring System
91
 
92
+ We use a scoring system to rank the models fairly. The score for each model is calculated using the following formula:
93
 
94
+ ```
95
+ Score = Win Rate * (1 - 1 / (Total Battles + 1))
96
+ ```
 
 
97
 
98
+ Let's break down this formula:
99
 
100
+ 1. **Win Rate**: This is the number of wins divided by the total number of battles. It ranges from 0 (no wins) to 1 (all wins).
101
 
102
+ 2. **1 - 1 / (Total Battles + 1)**: This factor adjusts the win rate based on the number of battles:
103
+ - We add 1 to the total battles to avoid division by zero and to ensure that even with just one battle, the score isn't discounted too heavily.
104
+ - As the number of battles increases, this factor approaches 1.
105
+ - For example:
106
+ - With 1 battle: 1 - 1/2 = 0.5
107
+ - With 10 battles: 1 - 1/11 β‰ˆ 0.91
108
+ - With 100 battles: 1 - 1/101 β‰ˆ 0.99
109
 
110
+ 3. **Purpose of this adjustment**:
111
+ - It gives more weight to models that have participated in more battles.
112
+ - A model with a high win rate but few battles will have a lower score than a model with the same win rate but more battles.
113
+ - This encourages models to participate in more battles to improve their score.
114
 
115
+ 4. **How it works in practice**:
116
+ - For a new model with just one battle, its score will be at most 50% of its win rate.
117
+ - As the model participates in more battles, its score will approach its actual win rate.
118
+ - This prevents models with very few battles from dominating the leaderboard based on lucky wins.
119
+
120
+ In essence, this formula balances two factors:
121
+ 1. How well a model performs (win rate)
122
+ 2. How much experience it has (total battles)
123
+
124
+ It ensures that the leaderboard favors models that consistently perform well over a larger number of battles, rather than those that might have a high win rate from just a few lucky encounters.
125
+
126
+ We sort the results primarily by this calculated score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
127
 
128
+ The leaderboard displays this calculated score alongside wins, losses, and other statistics.
129
+
130
+ ### ELO Leaderboard
131
+
132
+ In addition to the main leaderboard, we also maintain an ELO-based leaderboard:
133
+
134
+ - Models start with an initial ELO rating based on their size.
135
+ - ELO ratings are updated after each battle, with adjustments made based on the size difference between models.
136
+ - The ELO leaderboard provides an alternative perspective on model performance, taking into account the relative strengths of opponents.
137
+
138
+ ## πŸ€– Models
139
 
140
+ The arena currently supports the following compact models:
141
+
142
+ - LLaMA 3.2 (1B, 3B, 8-bit)
143
+ - LLaMA 3.1 (8B, 4-bit)
144
+ - Gemma 2 (2B, 4-bit; 2B, 8-bit; 9B, 4-bit)
145
+ - Qwen 2.5 (0.5B, 8-bit; 1.5B, 8-bit; 3B, 4-bit; 7B, 4-bit)
146
+ - Mistral 0.3 (7B, 4-bit)
147
+ - Phi 3.5 (3.8B, 4-bit)
148
+ - Mistral Nemo (12B, 4-bit)
149
+ - GLM4 (9B, 4-bit)
150
+ - InternLM2 v2.5 (7B, 4-bit)
151
+ - Falcon2 (11B, 4-bit)
152
+ - StableLM2 (1.6B, 8-bit; 12B, 4-bit)
153
+ - Yi v1.5 (6B, 4-bit; 9B, 4-bit)
154
+ - Ministral (8B, 4-bit)
155
+ - Dolphin 2.9.4 (8B, 4-bit)
156
+ - Granite 3 Dense (2B, 8-bit; 8B, 4-bit)
157
+ - Granite 3 MoE (1B, 8-bit; 3B, 4-bit)
158
 
159
  ## 🀝 Contributing
160
 
 
169
  - Thanks to the Ollama team for providing that amazing tool.
170
  - Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!
171
 
172
+ Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! πŸ†