weiqipedia
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -102,12 +102,10 @@ IFEval evaluates a model's ability to adhere to constraints provided in the prom
|
|
102 |
| Meta-Llama-3-8B-Instruct | 0.27 | 0.21 | 0.80 |
|
103 |
| Sailor-7B-Chat | 0.26 | 0.25 | 0.42 |
|
104 |
|
105 |
-
Note: Scores are the language normalized accuracies ie. models are penalized when they respond in the incorrect language even if they may follow the instructions correctly.
|
106 |
-
|
107 |
|
108 |
**MT-Bench**
|
109 |
|
110 |
-
MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversations and respond in ways that align with human needs. We use `gpt-4-1106-preview` as the judge model and compare against `gpt-3.5-turbo-0125` as the baseline model. The metric used is the win rate against the baseline model. A tie is given a score of 0.5.
|
111 |
|
112 |
| **Model** | **Indonesian** | **Vietnamese** | **English** |
|
113 |
|---------------------------------|:---------------------:|:---------------------:|:----------------------:|
|
@@ -121,7 +119,6 @@ MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversat
|
|
121 |
| Mistral-7B-Instruct-v0.3 | 0.347 | 0.202 | 0.524 |
|
122 |
| Sailor-7B-Chat | 0.290 | 0.314 | 0.190 |
|
123 |
|
124 |
-
Note: Scores are the Weighted Win Rate across reasoning, stem, math, humanities, extraction, writing, roleplay.
|
125 |
|
126 |
### Usage
|
127 |
SEA-LION can be run using the 🤗 Transformers library
|
|
|
102 |
| Meta-Llama-3-8B-Instruct | 0.27 | 0.21 | 0.80 |
|
103 |
| Sailor-7B-Chat | 0.26 | 0.25 | 0.42 |
|
104 |
|
|
|
|
|
105 |
|
106 |
**MT-Bench**
|
107 |
|
108 |
+
MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversations and respond in ways that align with human needs. We use `gpt-4-1106-preview` as the judge model and compare against `gpt-3.5-turbo-0125` as the baseline model. The metric used is the weighted win rate against the baseline model (i.e. average win rate across each category (Math, Reasoning, STEM, Humanities, Roleplay, Writing, Extraction)). A tie is given a score of 0.5.
|
109 |
|
110 |
| **Model** | **Indonesian** | **Vietnamese** | **English** |
|
111 |
|---------------------------------|:---------------------:|:---------------------:|:----------------------:|
|
|
|
119 |
| Mistral-7B-Instruct-v0.3 | 0.347 | 0.202 | 0.524 |
|
120 |
| Sailor-7B-Chat | 0.290 | 0.314 | 0.190 |
|
121 |
|
|
|
122 |
|
123 |
### Usage
|
124 |
SEA-LION can be run using the 🤗 Transformers library
|