GGUF
Inference Endpoints
IEIT-Yuan commited on
Commit
b5776cb
·
2 Parent(s): 05cd563 34ccdb1

Merge branch 'main' of https://huggingface.co/IEIT-Yuan/Yuan2-M32-gguf into main

Browse files
Files changed (1) hide show
  1. README.md +54 -48
README.md CHANGED
@@ -1,15 +1,20 @@
1
  ---
2
- license: other
3
- license_name: license-yuan
4
- license_link: https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan
5
  ---
 
6
  <div align="center">
7
  <h1>
8
- 源2.0 M32大模型
9
  </h1>
10
  </div>
11
 
12
 
 
 
 
 
 
 
13
  <div align="center">
14
 
15
 
@@ -23,50 +28,45 @@ license_link: https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan
23
  </div>
24
 
25
 
26
-
27
-
28
- <p align="center">
29
- 👾 <a href="https://www.modelscope.cn/profile/YuanLLM" target="_blank">ModelScope</a> • 🤗 <a href="https://huggingface.co/IEITYuan" target="_blank">Hugging Face</a> • 💬 <a href="https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/images/%E6%BA%90%E5%85%AC%E4%BC%97%E5%8F%B7%E4%BA%8C%E7%BB%B4%E7%A0%81.png" target="_blank">WeChat</a>• 📎 <a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">源2.0 M32论文</a>
30
- </p>
31
-
32
-
33
 
34
 
35
  ## 1. Introduction
36
 
37
 
38
- 浪潮信息 “源2.0 M32”大模型(简称,Yuan2.0-M32 采用稀疏混合专家架构(MoE),以Yuan2.0-2B模型作为基底模型,通过创新的门控网络(Attention Router)实现32个专家间(Experts*32)的协同工作与任务调度,在显著降低模型推理算力需求的情况下,带来了更强的模型精度表现与推理性能;源2.0-M32在多个业界主流的评测进行了代码生成、数学问题求解、科学问答与综合知识能力等方面的能力测评。结果显示,源2.0-M32在多项任务评测中,展示出了较为先进的能力表现,MATH(数学求解)、ARC-C(科学问答)测试成绩超越LLaMA3-700亿模型。Yuan2.0-M32大模型 基本信息如下:
39
-
40
- + **模型参数量:** 40B <br>
41
- + **专家数量:** 32 <br>
42
- + **激活专家数:** 2 <br>
43
- + **激活参数量:** 3.7B <br>
44
- + **训练数据量:** 2000B tokens <br>
45
- + **支持序列长度:** 16K <br>
46
 
 
 
 
 
 
 
47
 
48
- 同时,我们发布了Yuan2.0-M32模型的<a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">**技术报告**</a>,可以通过论文查看更详细的技术细节与测评结果。
49
 
50
 
51
 
52
  ## 2. Model Downloads
53
 
54
- **我们提供多种模型格式的下载链接:**
55
 
56
- | 模型 | 序列长度 | 模型格式 | 下载链接 |
57
  | :----------: | :------: | :-------: |:---------------------------: |
58
- | Yuan2.0-M32 | 16K | Megatron | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32)
59
- | Yuan2.0-M32-HF | 16K | HuggingFace | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-hf)
60
- | Yuan2.0-M32-GGUF-INT4 | 16K | GGUF | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf-int4/)
 
 
61
 
62
 
63
- ## 3. Evaluation Results
64
 
 
65
 
66
- **3.1 Benchmarks 测试** 🏆
67
 
 
68
 
69
- Yuan2.0-M32 模型与多个闭源、开源模型相比,均呈现出较好的精度表现。我们评测的数据集包括:Humaneval、GSM8K、MMLU、Math、ARC-Challenge,用于考察模型在自然语言理解、知识、数学计算和推理、代码生成等任务上的能力。Yuan2.0-M32模型在所有测评任务上全面超越了Llama3-8B、Mistral-8*7B等模型,综合能力表现可以对标 Llama3-70B模型。
 
70
 
71
 
72
 
@@ -79,22 +79,23 @@ Yuan2.0-M32 模型与多个闭源、开源模型相比,均呈现出较好的
79
  | Phi-3-mini | 58.5% | 82.5% | 68.8% | - | 84.9% |
80
  | Mistral-8*22B | 45.1% | 78.6% | 77.8% | 41,8% | 91.3% |
81
  | Mistral-8*7B | 40.2% | 58.4% | 70.86% | 28.4% | 85.9% |
82
- | **Yuan2.0-M32** | 74.4% | 92.5% | 72.2% | **55.9%** | **95.8%** |
 
 
 
83
 
84
 
85
- \* __*ARC-C*__:ARC-Challenge, ARC数据集中的高阶测试问题,需要深层的推理能力和更广泛的知识背景。
86
 
87
  -----
88
 
89
- **3.2 模型算力效率**
90
 
91
- | Model | Params (B) | Active Params (B) | GFLOPs/token (Inference) | GFLOPs/token (Fine-tune) | Mean Accuracy | Mean Accuracy GFLOPs per token (Inference) |
92
  | ------------------ | :---------------: | :------------: | :---------------: | :---------------: | :---------------:|:---------------:|
93
- | | 参数量 | 激活参数量 | 算力消耗/token (推理阶段) | 算力消耗/token (微调阶段) | 平均测评分数 | 模型算力效率 |
94
  | Llama3-70B | 70 | 70 | 140 | 420 | 79.25 | 0.57 |
95
  | Llama3-8B | 8 | 8 | 16 | 48 | 64.15 | 4.00 |
96
  | Mistral-8*22B | 141 | 39 | 78 | 234 | 72.38 | 0.93 |
97
- | Mistral-8*7B | 47 | 12.9 | 25.8 | 77,3 | 60.83 | 2.36 |
98
  | **Yuan2.0-M32** | 40 | 3.7 | 7.4 | 22.2 | 79.15 | 10.69 |
99
 
100
 
@@ -105,36 +106,41 @@ Yuan2.0-M32 模型与多个闭源、开源模型相比,均呈现出较好的
105
  ## 4. Quick Start
106
 
107
 
108
- **4.1 环境配置**
109
 
110
- 我们建议使用yuan2.0-M32的最新docker[镜像文件](https://hub.docker.com/r/yuanmodel/yuan2.0:m32).
111
-
112
- 我们可以通过下面命令启动容器:
113
 
114
  ```bash
115
- docker pull yuanmodel/yuan2.0:V1-base
116
- docker run --gpus all --privileged --ulimit stack=68719476736 --shm-size=1000G -itd -v /path/to/yuan_2.0:/workspace/yuan_2.0 -v /path/to/dataset:/workspace/dataset -v /path/to/checkpoints:/workspace/checkpoints --name your_name yuanmodel/yuan2.0:V1-base
117
  docker exec -it your_name bash
118
  ```
119
 
120
 
121
- **4.2 数据预处理**
 
 
 
 
 
122
 
123
- 我们提供了数据预处理的脚本,参考[数据预处理说明文档](./docs/data_process.md).
124
 
125
- **4.3 模型预训练**
126
 
127
- 我们提供了用于预训练的文档和 [`example`](./examples)的脚本,具体使用方法可以参考[预训练说明文档](./docs/pretrain.md).
128
 
129
- **4.4 推理服务**
130
 
131
- -详细部署方案可以参考[vllm](https://github.com/IEIT-Yuan/Yuan2.0-M32/edit/main/vllm/README_Yuan_vllm.md)
132
 
133
 
134
  ## 5. Statement of Agreement
135
 
136
- 使用源2.0代码及模型需遵循 [Apache 2.0](https://github.com/xxxxxxE) 开源协议和[《源2.0模型许可协议》](./LICENSE-Yuan),源2.0模型支持商用,不需要申请授权,请您了解并遵循,勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。
137
 
138
- 尽管模型在训练时我们已采取措施尽力确保数据的合规性和准确性,但模型参数量巨大且受概率随机性因素影响,我们无法保证输出内容的准确性,且模型易被输入指令所误导,本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。**您将对通过使用、复制、分发和修改模型等方式利用该开源项目所产生的风险与后果,独自承担全部责任。**
 
 
 
 
139
 
140
 
 
 
1
  ---
2
+ license: apache-2.0
 
 
3
  ---
4
+
5
  <div align="center">
6
  <h1>
7
+ Yuan2.0-M32: Mixture of Experts with Attention Router
8
  </h1>
9
  </div>
10
 
11
 
12
+ <p align="center">
13
+ 👾 <a href="https://www.modelscope.cn/profile/YuanLLM" target="_blank">ModelScope</a> • 🤗 <a href="https://huggingface.co/IEITYuan" target="_blank">Hugging Face</a> • 💬 <a href="https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/images/%E6%BA%90%E5%85%AC%E4%BC%97%E5%8F%B7%E4%BA%8C%E7%BB%B4%E7%A0%81.png" target="_blank">WeChat</a>• 📎 <a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">Yuan2.0-M32 Paper</a>
14
+ </p>
15
+
16
+
17
+
18
  <div align="center">
19
 
20
 
 
28
  </div>
29
 
30
 
31
+ -----
 
 
 
 
 
 
32
 
33
 
34
  ## 1. Introduction
35
 
36
 
37
+ **Yuan2.0-M32** is a Mixture-of-Experts (MoE) language model with 32 experts, of which 2 are active. A new router network, Attention Router, is proposed and has been adopted for more efficient expert selection, boosting accuracy by 3.8% over models using a classical router network. Yuan 2.0-M32 is trained from scratch with 2000B tokens, and its training computation is only 9.25% of that required by a dense model of the same parameter scale. Demonstrating competitive capabilities in coding, math, and various specialized fields, Yuan2.0-M32 operates with only 3.7B active parameters out of a total 40B, and a forward computation of 7.4 GFLOPS per token, which is just 1/19th of Llama3-70B's requirement. Yuan 2.0-M32 has surpassed Llama3-70B on the MATH and ARC-Challenge benchmarks, achieving accuracies of 55.9% and 95.8%, respectively. The basic information of the **Yuan2.0-M32** model is as follows:
 
 
 
 
 
 
 
38
 
39
+ + **Total Parameters :** 40B <br>
40
+ + **Experts:** 32 <br>
41
+ + **Active Experts:** 2 <br>
42
+ + **Active Parameters:** 3.7B <br>
43
+ + **Training Tokens:** 2000B tokens <br>
44
+ + **Sequence Length:** 16K <br>
45
 
46
+ The technical report for the Yuan2.0-M32 model has been released, and you can find more detailed technical information and evaluation results by referring to the <a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">**paper**</a>.
47
 
48
 
49
 
50
  ## 2. Model Downloads
51
 
 
52
 
53
+ | Model | Sequence Length | Type | Download |
54
  | :----------: | :------: | :-------: |:---------------------------: |
55
+ | Yuan2.0-M32 | 16K | Megatron | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32)
56
+ | Yuan2.0-M32-HF | 16K | HuggingFace | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-hf)
57
+ | Yuan2.0-M32-GGUF | 16K | GGUF | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf)
58
+ | Yuan2.0-M32-GGUF-INT4 | 16K | GGUF | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf-int4)
59
+
60
 
61
 
 
62
 
63
+ ## 3. Evaluation
64
 
 
65
 
66
+ **3.1 Benchmarks** 🏆
67
 
68
+
69
+ We conducted a thorough evaluation of the Yuan2.0-M32 model across a range of benchmarks, including HumanEval, GSM8K, MMLU, Math, and ARC-Challenge. These benchmarks are designed to test the model's proficiency in key areas such as natural language understanding, knowledge acquisition, mathematical computation and reasoning, and code generation. The Yuan2.0-M32 has shown a consistent and significant advantage over other models like Llama3-8B and Mistral-8×7B, excelling in all evaluated tasks. Remarkably, its overall performance is on par with the more substantial Llama3-70B model.The detailed evaluation results are outlined in the subsequent table.
70
 
71
 
72
 
 
79
  | Phi-3-mini | 58.5% | 82.5% | 68.8% | - | 84.9% |
80
  | Mistral-8*22B | 45.1% | 78.6% | 77.8% | 41,8% | 91.3% |
81
  | Mistral-8*7B | 40.2% | 58.4% | 70.86% | 28.4% | 85.9% |
82
+ | **Yuan2.0-M32** | 74.4% | 92.7% | 72.2% | **55.9%** | **95.8%** |
83
+
84
+
85
+ \* __*ARC-C*__: AI2 Reasoning Challenge (ARC) benchmark contains more complex parts that need further reasoning.
86
 
87
 
 
88
 
89
  -----
90
 
91
+ **3.2 Computational Utilization for Model**
92
 
93
+ | Model | Params (B) | Active Params (B) | GFLOPs/token (Inference) | GFLOPS/token (Fine-tune) | Mean Accuracy | Average Accuracy/GFLOPSs per token (Inference) |
94
  | ------------------ | :---------------: | :------------: | :---------------: | :---------------: | :---------------:|:---------------:|
 
95
  | Llama3-70B | 70 | 70 | 140 | 420 | 79.25 | 0.57 |
96
  | Llama3-8B | 8 | 8 | 16 | 48 | 64.15 | 4.00 |
97
  | Mistral-8*22B | 141 | 39 | 78 | 234 | 72.38 | 0.93 |
98
+ | Mistral-8*7B | 47 | 12.9 | 25.8 | 77.3 | 60.83 | 2.36 |
99
  | **Yuan2.0-M32** | 40 | 3.7 | 7.4 | 22.2 | 79.15 | 10.69 |
100
 
101
 
 
106
  ## 4. Quick Start
107
 
108
 
109
+ **4.1 Environment Config**
110
 
111
+ We strongly recommend using the latest release of docker images of Yuan2.0-M32.You can launch an instance of the Yuan 2.0 container with the following Docker commands:
 
 
112
 
113
  ```bash
114
+ docker pull yuanmodel/yuan2.0:m32
115
+ docker run --gpus all --privileged --ulimit stack=68719476736 --shm-size=1000G -itd -v /path/to/yuan_2.0:/workspace/yuan_2.0 -v /path/to/dataset:/workspace/dataset -v /path/to/checkpoints:/workspace/checkpoints --name your_name yuanmodel/yuan2.0:m32
116
  docker exec -it your_name bash
117
  ```
118
 
119
 
120
+ **4.2 Data Preprocess**
121
+
122
+ We have provided the data preprocess script. See documentation [here](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/data_process.md
123
+ ).
124
+
125
+ **4.3 Model Pretrain**
126
 
127
+ We've provided several scripts for pretraining in the [`example`](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/examples). The details can be seen from documentation [here](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/pretrain.md).
128
 
129
+ **4.4 Inference Service**
130
 
 
131
 
 
132
 
133
+ For a detailed deployment plan, please refer to [vllm](https://github.com/IEIT-Yuan/Yuan2.0-M32/edit/main/vllm/README_Yuan_vllm.md).
134
 
135
 
136
  ## 5. Statement of Agreement
137
 
 
138
 
139
+ The use of the source code in this repository requires compliance with the open source license agreement Apache 2.0. The Yuan2.0 model supports commercial use and does not require authorization. Please understand and comply with the [《Yuan2.0 Model License Agreement》](./LICENSE-Yuan). Do not use the open source model and code, as well as derivatives generated from open source projects, for any purposes that may cause harm to the country and society, or for any services that have not undergone security assessment and filing. Although we have taken measures to ensure the compliance and accuracy of the data during training, the model has a huge number of parameters and is affected by probability and randomness factors. We cannot guarantee the accuracy of the output content, and the model is easily misled by input instructions. This project does not assume any data security, public opinion risks, or any model misleading, abusing, spreading caused by open-source models and code Risks and responsibilities arising from improper utilization You will be solely responsible for the risks and consequences arising from the use, copying, distribution, and modification of the model in this open source project
140
+
141
+
142
+
143
+ ## 6. Contact Us
144
 
145
 
146
+ **If you have any questions, please raise an issue or contact us at** [email protected]