Upload README.md
Browse files
README.md
CHANGED
@@ -47,25 +47,26 @@ InternLM3 supports both the deep thinking mode for solving complicated reasoning
|
|
47 |
|
48 |
We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
|
49 |
|
50 |
-
| Benchmark
|
51 |
-
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- |
|
52 |
-
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0
|
53 |
-
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7
|
54 |
-
| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1
|
55 |
-
| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9
|
56 |
-
| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2
|
57 |
-
| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5
|
58 |
-
| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2
|
59 |
-
| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0
|
60 |
-
| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3
|
61 |
-
| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8
|
62 |
-
| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6
|
63 |
-
| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7
|
64 |
-
| Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7
|
65 |
-
| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7
|
66 |
-
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3
|
67 |
-
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87
|
68 |
-
|
|
|
69 |
- The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
|
70 |
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
|
71 |
|
@@ -212,7 +213,7 @@ git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
|
|
212 |
cd vllm
|
213 |
python use_existing_torch.py
|
214 |
pip install -r requirements-build.txt
|
215 |
-
pip install -e . --no-build-
|
216 |
```
|
217 |
|
218 |
inference code:
|
@@ -409,12 +410,12 @@ for chunk in stream:
|
|
409 |
|
410 |
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
|
411 |
```python
|
412 |
-
git clone https://github.com/RunningLeon/vllm.git
|
413 |
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
|
414 |
cd vllm
|
415 |
python use_existing_torch.py
|
416 |
pip install -r requirements-build.txt
|
417 |
-
pip install -e . --no-build-
|
418 |
```
|
419 |
|
420 |
inference code
|
@@ -478,7 +479,7 @@ InternLM3支持通过长思维链求解复杂推理任务的深度思考模式
|
|
478 |
|
479 |
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
|
480 |
|
481 |
-
| 评测集\模型
|
482 |
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- |
|
483 |
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
|
484 |
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
|
@@ -497,6 +498,7 @@ InternLM3支持通过长思维链求解复杂推理任务的深度思考模式
|
|
497 |
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
|
498 |
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
|
499 |
|
|
|
500 |
- 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
|
501 |
- 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
|
502 |
|
@@ -645,12 +647,12 @@ for chunk in stream:
|
|
645 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
646 |
|
647 |
```python
|
648 |
-
git clone https://github.com/RunningLeon/vllm.git
|
649 |
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
|
650 |
cd vllm
|
651 |
python use_existing_torch.py
|
652 |
pip install -r requirements-build.txt
|
653 |
-
pip install -e . --no-build-
|
654 |
```
|
655 |
|
656 |
推理代码
|
@@ -847,12 +849,12 @@ for chunk in stream:
|
|
847 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
848 |
|
849 |
```python
|
850 |
-
git clone https://github.com/RunningLeon/vllm.git
|
851 |
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
|
852 |
cd vllm
|
853 |
python use_existing_torch.py
|
854 |
pip install -r requirements-build.txt
|
855 |
-
pip install -e . --no-build-
|
856 |
```
|
857 |
|
858 |
推理代码
|
|
|
47 |
|
48 |
We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
|
49 |
|
50 |
+
| | Benchmark | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(closed source) |
|
51 |
+
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | -------------------------- |
|
52 |
+
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
|
53 |
+
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
|
54 |
+
| | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
|
55 |
+
| Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
|
56 |
+
| | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
|
57 |
+
| | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
|
58 |
+
| | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
|
59 |
+
| MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
|
60 |
+
| | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
|
61 |
+
| Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
|
62 |
+
| | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
|
63 |
+
| Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
|
64 |
+
| Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
|
65 |
+
| Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
|
66 |
+
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
|
67 |
+
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
|
68 |
+
|
69 |
+
- Values marked in bold indicate the **highest** in open source models
|
70 |
- The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
|
71 |
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
|
72 |
|
|
|
213 |
cd vllm
|
214 |
python use_existing_torch.py
|
215 |
pip install -r requirements-build.txt
|
216 |
+
pip install -e . --no-build-isolation
|
217 |
```
|
218 |
|
219 |
inference code:
|
|
|
410 |
|
411 |
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
|
412 |
```python
|
413 |
+
git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
|
414 |
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
|
415 |
cd vllm
|
416 |
python use_existing_torch.py
|
417 |
pip install -r requirements-build.txt
|
418 |
+
pip install -e . --no-build-isolation
|
419 |
```
|
420 |
|
421 |
inference code
|
|
|
479 |
|
480 |
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
|
481 |
|
482 |
+
| | 评测集\模型 | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) |
|
483 |
| ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- |
|
484 |
| General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
|
485 |
| | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
|
|
|
498 |
| | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
|
499 |
| | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
|
500 |
|
501 |
+
- 表中标粗的数值表示在对比的开源模型中的最高值。
|
502 |
- 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
|
503 |
- 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
|
504 |
|
|
|
647 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
648 |
|
649 |
```python
|
650 |
+
git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
|
651 |
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
|
652 |
cd vllm
|
653 |
python use_existing_torch.py
|
654 |
pip install -r requirements-build.txt
|
655 |
+
pip install -e . --no-build-isolation
|
656 |
```
|
657 |
|
658 |
推理代码
|
|
|
849 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
850 |
|
851 |
```python
|
852 |
+
git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
|
853 |
# and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
|
854 |
cd vllm
|
855 |
python use_existing_torch.py
|
856 |
pip install -r requirements-build.txt
|
857 |
+
pip install -e . --no-build-isolation
|
858 |
```
|
859 |
|
860 |
推理代码
|