yinsong1986
commited on
Commit
·
c2d54e2
1
Parent(s):
ba4d664
Update README.md
Browse files
README.md
CHANGED
@@ -21,14 +21,14 @@ Then We evaluated `Mistral-7B-Instruct-v0.1` against benchmarks that are specifi
|
|
21 |
Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
|
22 |
there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
|
23 |
|
24 |
-
|
25 |
|Model Name|Input length| Input length | Input length| Input length| Input length|
|
26 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|
|
27 |
| | 2851| 5568 |8313 | 11044 | 13780
|
28 |
-
| Mistral-7B-Instruct-v0.1 |
|
29 |
| MistralLite | **100%** | **100%** | **100%** | **100%** | **98%** |
|
30 |
|
31 |
-
|
32 |
|
33 |
|Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
|
34 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
|
@@ -36,7 +36,7 @@ there were some limitations on its performance on longer context. Motivated by i
|
|
36 |
| Mistral-7B-Instruct-v0.1 | **98%** | 62% | 42% | 42% | 32% | 30% |
|
37 |
| MistralLite | **98%** | **92%** | **88%** | **76%** | **70%** | **60%** |
|
38 |
|
39 |
-
|
40 |
|
41 |
|Model Name|Input length| Input length | Input length| Input length|
|
42 |
|----------|-------------:|-------------:|------------:|-----------:|
|
@@ -44,7 +44,7 @@ there were some limitations on its performance on longer context. Motivated by i
|
|
44 |
| Mistral-7B-Instruct-v0.1 | **100%** | 50% | 20% | 30% |
|
45 |
| MistralLite | **100%** | **100%** | **100%** | **100%** |
|
46 |
|
47 |
-
|
48 |
|Model Name| Test set Accuracy | Hard subset Accuracy|
|
49 |
|----------|-------------:|-------------:|
|
50 |
| Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |
|
|
|
21 |
Although the performance of the models on long context was fairly competitive on long context less than 4096 tokens,
|
22 |
there were some limitations on its performance on longer context. Motivated by improving its performance on longer context, we finetuned the Mistral 7B model, and produced `Mistrallite`. The model managed to `signifantly boost the performance of long context handling` over Mistral-7B-Instruct-v0.1. The detailed `long context evalutaion results` are as below:
|
23 |
|
24 |
+
1. [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/)
|
25 |
|Model Name|Input length| Input length | Input length| Input length| Input length|
|
26 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|
|
27 |
| | 2851| 5568 |8313 | 11044 | 13780
|
28 |
+
| Mistral-7B-Instruct-v0.1 | 100% | 50% | 2% | 0% | 0% |
|
29 |
| MistralLite | **100%** | **100%** | **100%** | **100%** | **98%** |
|
30 |
|
31 |
+
2. [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results)
|
32 |
|
33 |
|Model Name|Input length| Input length | Input length| Input length| Input length|Input length|
|
34 |
|----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
|
|
|
36 |
| Mistral-7B-Instruct-v0.1 | **98%** | 62% | 42% | 42% | 32% | 30% |
|
37 |
| MistralLite | **98%** | **92%** | **88%** | **76%** | **70%** | **60%** |
|
38 |
|
39 |
+
3. [Pass key Retrieval](https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101)
|
40 |
|
41 |
|Model Name|Input length| Input length | Input length| Input length|
|
42 |
|----------|-------------:|-------------:|------------:|-----------:|
|
|
|
44 |
| Mistral-7B-Instruct-v0.1 | **100%** | 50% | 20% | 30% |
|
45 |
| MistralLite | **100%** | **100%** | **100%** | **100%** |
|
46 |
|
47 |
+
4. [Question Answering with Long Input Texts](https://nyu-mll.github.io/quality/)
|
48 |
|Model Name| Test set Accuracy | Hard subset Accuracy|
|
49 |
|----------|-------------:|-------------:|
|
50 |
| Mistral-7B-Instruct-v0.1 | 44.3% | 39.7% |
|