aisingapore
/

llama3-8b-cpt-sea-lionv2.1-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weiqipedia commited on Jul 30, 2024

Commit

19d6eec

·

verified ·

1 Parent(s): 0b7422c

Update README.md

Files changed (1) hide show

README.md +1 -4

README.md CHANGED Viewed

@@ -102,12 +102,10 @@ IFEval evaluates a model's ability to adhere to constraints provided in the prom
 | Meta-Llama-3-8B-Instruct         | 0.27                                 | 0.21                                 | 0.80                              |
 | Sailor-7B-Chat                   | 0.26                                 | 0.25                                 | 0.42                              |
-Note: Scores are the language normalized accuracies ie. models are penalized when they respond in the incorrect language even if they may follow the instructions correctly.
 **MT-Bench**
-MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversations and respond in ways that align with human needs. We use `gpt-4-1106-preview` as the judge model and compare against `gpt-3.5-turbo-0125` as the baseline model. The metric used is the win rate against the baseline model. A tie is given a score of 0.5.
 |    **Model**                    | **Indonesian**        | **Vietnamese**        | **English**            |
 |---------------------------------|:---------------------:|:---------------------:|:----------------------:|
@@ -121,7 +119,6 @@ MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversat
 | Mistral-7B-Instruct-v0.3        | 0.347                 | 0.202                 | 0.524                  |
 | Sailor-7B-Chat                  | 0.290                 | 0.314                 | 0.190                  |
-Note: Scores are the Weighted Win Rate across reasoning, stem, math, humanities, extraction, writing, roleplay.
 ### Usage
 SEA-LION can be run using the 🤗 Transformers library

 | Meta-Llama-3-8B-Instruct         | 0.27                                 | 0.21                                 | 0.80                              |
 | Sailor-7B-Chat                   | 0.26                                 | 0.25                                 | 0.42                              |
 **MT-Bench**
+MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversations and respond in ways that align with human needs. We use `gpt-4-1106-preview` as the judge model and compare against `gpt-3.5-turbo-0125` as the baseline model. The metric used is the weighted win rate against the baseline model (i.e. average win rate across each category (Math, Reasoning, STEM, Humanities, Roleplay, Writing, Extraction)). A tie is given a score of 0.5.
 |    **Model**                    | **Indonesian**        | **Vietnamese**        | **English**            |
 |---------------------------------|:---------------------:|:---------------------:|:----------------------:|
 | Mistral-7B-Instruct-v0.3        | 0.347                 | 0.202                 | 0.524                  |
 | Sailor-7B-Chat                  | 0.290                 | 0.314                 | 0.190                  |
 ### Usage
 SEA-LION can be run using the 🤗 Transformers library