Files changed (1) hide show
  1. README.md +47 -45
README.md CHANGED
@@ -47,25 +47,26 @@ InternLM3 supports both the deep thinking mode for solving complicated reasoning
47
 
48
  We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
49
 
50
- | Benchmark | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) |
51
- | ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- |
52
- | General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
53
- | | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
54
- | | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
55
- | Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
56
- | | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
57
- | | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
58
- | | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
59
- | MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
60
- | | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
61
- | Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
62
- | | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
63
- | Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
64
- | Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
65
- | Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
66
- | | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
67
- | | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
68
-
 
69
  - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
70
  - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
71
 
@@ -212,7 +213,7 @@ git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
212
  cd vllm
213
  python use_existing_torch.py
214
  pip install -r requirements-build.txt
215
- pip install -e . --no-build-isolatio
216
  ```
217
 
218
  inference code:
@@ -409,12 +410,12 @@ for chunk in stream:
409
 
410
  We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
411
  ```python
412
- git clone https://github.com/RunningLeon/vllm.git
413
  # and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
414
  cd vllm
415
  python use_existing_torch.py
416
  pip install -r requirements-build.txt
417
- pip install -e . --no-build-isolatio
418
  ```
419
 
420
  inference code
@@ -478,25 +479,26 @@ InternLM3支持通过长思维链求解复杂推理任务的深度思考模式
478
 
479
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
480
 
481
- | 评测集\模型 | | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(close source) |
482
- | ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ------------------------- |
483
- | General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
484
- | | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
485
- | | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
486
- | Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
487
- | | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
488
- | | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
489
- | | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
490
- | MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
491
- | | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
492
- | Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
493
- | | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
494
- | Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
495
- | LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
496
- | Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
497
- | | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
498
- | | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
499
-
 
500
  - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
501
  - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
502
 
@@ -645,12 +647,12 @@ for chunk in stream:
645
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
646
 
647
  ```python
648
- git clone https://github.com/RunningLeon/vllm.git
649
  # and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
650
  cd vllm
651
  python use_existing_torch.py
652
  pip install -r requirements-build.txt
653
- pip install -e . --no-build-isolatio
654
  ```
655
 
656
  推理代码
@@ -847,12 +849,12 @@ for chunk in stream:
847
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
848
 
849
  ```python
850
- git clone https://github.com/RunningLeon/vllm.git
851
  # and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
852
  cd vllm
853
  python use_existing_torch.py
854
  pip install -r requirements-build.txt
855
- pip install -e . --no-build-isolatio
856
  ```
857
 
858
  推理代码
 
47
 
48
  We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results.
49
 
50
+ | | Benchmark | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(closed source) |
51
+ | ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | -------------------------- |
52
+ | General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
53
+ | | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
54
+ | | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
55
+ | Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
56
+ | | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
57
+ | | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
58
+ | | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
59
+ | MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
60
+ | | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
61
+ | Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
62
+ | | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
63
+ | Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
64
+ | Long Context | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
65
+ | Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
66
+ | | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
67
+ | | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
68
+
69
+ - Values marked in bold indicate the **highest** in open source models
70
  - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means evaluating with Thinking Mode), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/).
71
  - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/).
72
 
 
213
  cd vllm
214
  python use_existing_torch.py
215
  pip install -r requirements-build.txt
216
+ pip install -e . --no-build-isolation
217
  ```
218
 
219
  inference code:
 
410
 
411
  We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
412
  ```python
413
+ git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
414
  # and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
415
  cd vllm
416
  python use_existing_torch.py
417
  pip install -r requirements-build.txt
418
+ pip install -e . --no-build-isolation
419
  ```
420
 
421
  inference code
 
479
 
480
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
481
 
482
+ | | 评测集\模型 | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(闭源) |
483
+ | ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ----------------- |
484
+ | General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
485
+ | | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
486
+ | | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
487
+ | Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
488
+ | | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
489
+ | | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
490
+ | | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
491
+ | MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
492
+ | | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
493
+ | Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
494
+ | | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
495
+ | Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
496
+ | LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
497
+ | Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
498
+ | | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
499
+ | | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
500
+
501
+ - 表中标粗的数值表示在对比的开源模型中的最高值。
502
  - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
503
  - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
504
 
 
647
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
648
 
649
  ```python
650
+ git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
651
  # and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
652
  cd vllm
653
  python use_existing_torch.py
654
  pip install -r requirements-build.txt
655
+ pip install -e . --no-build-isolation
656
  ```
657
 
658
  推理代码
 
849
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
850
 
851
  ```python
852
+ git clone -b support-internlm3 https://github.com/RunningLeon/vllm.git
853
  # and then follow https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source to install
854
  cd vllm
855
  python use_existing_torch.py
856
  pip install -r requirements-build.txt
857
+ pip install -e . --no-build-isolation
858
  ```
859
 
860
  推理代码