IEITYuan
/

Yuan2-M32-hf

Text Generation

Transformers

PyTorch

yuan

custom_code

Model card Files Files and versions Community

IEIT-Yuan commited on May 28, 2024

Commit

0df2ff5

verified ·

1 Parent(s): 8a6486a

Update README.md

Browse files

Files changed (1) hide show

README.md +102 -57

README.md CHANGED Viewed

@@ -3,95 +3,140 @@ license: other
 license_name: license-yuan
 license_link: https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan
 ---
 <div align="center">
 <h1>
-  Yuan 2
 </h1>
 </div>
 <div align="center">
-<a href="https://github.com/IEIT-Yuan/Yuan-2.0" target="_blank"> 💻GitHub Repo</a> | <a href="http://arxiv.org/pdf/2311.15786.pdf" target="_blank">📃Yuan2.0-paper</a>
 </div>
-# 目录/Table of Contents
-- [模型介绍/Introduction](#Introduction)
-- [代码调用/Code Usage](#Usage)
-- [Benchmark评估/Benchmark Evaluation](#Benchmark)
-- [声明与协议/Terms and Conditions](#Terms)
-- [引用/Cite](#Cite)
-# <span id="Introduction">模型介绍/Introduction</span>
-源2.0 是浪潮信息发布的新一代基础语言大模型。我们开源了全部的3个模型源2.0-102B，源2.0-51B和源2.0-2B。并且我们提供了预训练，微调，推理服务的相关脚本，以供研发人员做进一步的开发。源2.0是在源1.0的基础上，利用更多样的高质量预训练数据和指令微调数据集，令模型在语义、数学、推理、代码、知识等不同方面具备更强的理解能力。
-Yuan2.0 is a new generation Fundamental Large Language Model developed by IEIT System. We have published all three models, Yuan 2.0-102B, Yuan 2.0-51B, and Yuan 2.0-2B. And we provide relevant scripts for pretraining, fine-tuning, and inference services for other developers. Yuan2.0 is based on Yuan1.0, utilizing a wider range of high-quality pre training data and instruction fine-tuning datasets to enhance the model's understanding of semantics, mathematics, reasoning, code, knowledge, and other aspects.
-# <span id="Usage">代码调用/Code Usage</span>
-可以通过如下代码调用 Yuan2-2B-MoE 模型来生成文本：
-You can generate text by invoking the Yuan2-2B-MoE model with the following code:
-```python
-import torch, transformers
-import sys, os
-sys.path.append(
-    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
-from transformers import AutoModelForCausalLM,AutoTokenizer,LlamaTokenizer
-print("Creat tokenizer...")
-tokenizer = LlamaTokenizer.from_pretrained('IEITYuan/Yuan2-2B-hf-moe', add_eos_token=False, add_bos_token=False, eos_token='<eod>')
-tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)
-print("Creat model...")
-model = AutoModelForCausalLM.from_pretrained('IEITYuan/Yuan2-2B-hf-moe', device_map='auto', torch_dtype=torch.bfloat16, trust_remote_code=True)
-inputs = tokenizer("请问目前最先进的机器学习算法有哪些？", return_tensors="pt")["input_ids"].to("cuda:0")
-outputs = model.generate(inputs,do_sample=False,max_length=100)
-print(tokenizer.decode(outputs[0]))
-```
-# <span id="Benchmark">Benchmark评估/Benchmark Evaluation</span>
-我们提供了[HumanEval](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_humaneval.md)，[AGIEval-GK-Math](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_agieval_math.md)，[GSM8K](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_gsm8k.md)和[TruthfulQA](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_TruthfulQA.md)的评估脚本。在4个典型任务上，我们用源2.0不同版本模型上进行了性能测试。
-We have provided evaluation scripts for [HumanEval](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_humaneval.md),[AGIEval-GK-Math](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_agieval_math.md),[GSM8K](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_gsm8k.md) and [TruthfulQA](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/eval_TruthfulQA.md). Performance tests were conducted on different versions of the Yuan2.0 model for four typical tasks.
-| Model             | GSM8K   | AGIEval-GK-Math-QA     | AGIEval-GK-Math-Cloze     | HumanEval | TurthfulQA |
-| ----------------- | :----:  | :------------: | :---------------: | :-------: | ---------- |
-|  GPT-4            |  92%    |     47.0%      |       16.1%       |   86.6%   |     59%    |
-|  ChatGPT         | 68.6%\* |     36.5%      |        7.3%       |  66.5%\*  |     34%\*  |
-|  Llama2           | 56.8%   |       -        |         -         |   29.9%   |       -    |
-| 源2.0-102B      | 76.6%   |     38.7%      |       13.5%       |   67.1%   |     58%    |
-| 源2.0-102B-SC   | 86.2%   |     45.5%      |       15.2%       |   77.4%   |       -    |
-\* 使用与源2.0完全相同的输入数据对ChatGPT进行测试，时间2023年11月
-\* Testing ChatGPT using the same input data as Yuan2.0, as of November 2023.
-# <span id="Terms">声明与协议/Terms and Conditions</span>
-对该模型的原代码仓库使用遵循开源许可协议 Apache 2.0。
-源2.0模型支持商用，不需要申请授权，请您了解并遵循[《源2.0模型许可协议》](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan)，勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。
-尽管模型在训练时我们已采取措施尽力确保数据的合规性和准确性，但模型参数量巨大且受概率随机性因素影响，我们无法保证输出内容的准确性，且模型易被输入指令所误导，本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。**您将对通过使用、复制、分发和修改模型等方式利用该开源项目所产生的风险与后果，独自承担全部责任。**
-The use of the original code repository for this model requires compliance with the open source license agreement Apache 2.0. The Yuan2.0 model supports commercial use and does not require authorization. Please understand and comply with the [《Yuan 2.0 Model License Agreement》](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan). Do not use the open source model and code, as well as derivatives generated from open source projects, for any purposes that may cause harm to the country and society, or for any services that have not undergone security assessment and filing. Although we have taken measures to ensure the compliance and accuracy of the data during training, the model has a huge number of parameters and is affected by probability and randomness factors. We cannot guarantee the accuracy of the output content, and the model is easily misled by input instructions. This project does not assume any data security, public opinion risks, or any model misleading, abusing, spreading caused by open-source models and code Risks and responsibilities arising from improper utilization **You will be solely responsible for the risks and consequences arising from the use, copying, distribution, and modification of the model in this open source project.**
-# <span id="Cite">引用/Cite</span>
-欢迎阅读我们的技术报告 [YUAN 2.0: A Large Language Model with Localized Filtering-based Attention](http://arxiv.org/pdf/2311.15786.pdf)！
-Welcome to read our technical report [YUAN 2.0: A Large Language Model with Localized Filtering-based Attention](http://arxiv.org/pdf/2311.15786.pdf)！
-```latex
-@article{Wu2023,
-title = {{YUAN 2.0: A Large Language Model with Localized Filtering-based Attention}},
-author = {Wu, Shaohua and Zhao, Xudong and Wang, Shenling and Luo, Jiangang and Li, Lingjun and Chen, Xi and Zhao, Bing and Wang, Wei and Yu, Tong and Zhang, Rongguo and Zhang, Jiahua and Wang, Chao},
-url = {http://arxiv.org/abs/2311.15786},
-year = {2023}
-}
 ```

 license_name: license-yuan
 license_link: https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan
 ---
 <div align="center">
 <h1>
+  源2.0 M32大模型
 </h1>
 </div>
 <div align="center">
+  <a href="code_license">
+    <img alt="Code License" src="https://img.shields.io/badge/Apache%202.0%20-green?style=flat&label=Code%20License&link=https%3A%2F%2Fgithub.com%2FIEIT-Yuan%2FYuan-2.0-MoE%3Ftab%3DApache-2.0-1-ov-file"/>
+  </a>
+  <a href="model_license">
+    <img alt="Model License" src="https://img.shields.io/badge/Yuan2.0%20License-blue?style=flat&logoColor=blue&label=Model%20License&color=blue&link=https%3A%2F%2Fgithub.com%2FIEIT-Yuan%2FYuan-2.0%2Fblob%2Fmain%2FLICENSE-Yuan" />
+  </a>
 </div>
+<p align="center">
+👾 <a href="https://www.modelscope.cn/profile/YuanLLM" target="_blank">ModelScope</a> • 🤗 <a href="https://huggingface.co/IEITYuan" target="_blank">Hugging Face</a> •  💬 <a href="https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/images/%E6%BA%90%E5%85%AC%E4%BC%97%E5%8F%B7%E4%BA%8C%E7%BB%B4%E7%A0%81.png" target="_blank">WeChat</a>• 📎  <a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Yuan%202.0-M32-%20Mixture%20of%20Experts%20with%20Attention%20Router.pdf" target="_blank">源2.0 M32论文</a>
+</p>
+##  1. Introduction
+浪潮信息 **“源2.0 M32”大模型（简称，Yuan2.0-M32）** 采用稀疏混合专家架构（MoE），以Yuan2.0-2B模型作为基底模型，通过创新的门控网络（Attention Router）实现32个专家间（Expers*32）的协同工作与任务调度，在显著降低模型推理算力需求的情况下，带来了更强的模型精度表现与推理性能；源2.0-M32在多个业界主流的评测进行了代码生成、数学问题求解、科学问答与综合知识能力等方面的能力测评。结果显示，源2.0-M32在多项任务评测中，展示出了较为先进的能力表现，并在MATH（数学求解）、MMLU（综合知识能力）ARC-C（科学问答）榜单上全面超越LLaMA3-700亿模型。。**Yuan2.0-M32大模型** 基本信息如下：
++ **模型参数量：** 40B <br>
++ **专家数量：** 32 <br>
++ **激活专家数：** 2 <br>
++ **激活参数量：** 3.7B <br>
++ **训练数据量：** 2000B tokens <br>
++ **支持序列长度：** 16K <br>
+同时，我们发布了Yuan2.0-M32模型的<a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Yuan%202.0-M32-%20Mixture%20of%20Experts%20with%20Attention%20Router.pdf" target="_blank">**技术报告**</a>，可以通过论文查看更详细的技术细节与测评结果。
+##  2. Model Downloads
+**我们提供多种模型格式的下载链接：**
+|    模型     | 序列长度  |   模型格式   |         下载链接         |
+| :----------: | :------: | :-------: |:---------------------------: |
+| Yuan2.0-M32           |    16K    |    Megatron    | [HuggingFace](https://huggingface.co/IEIT-Yuan/Yuan2-M32)
+| Yuan2.0-M32-HF        |    16K    |    HuggingFace | [HuggingFace](https://huggingface.co/IEIT-Yuan/Yuan2-M32-hf)
+| Yuan2.0-M32-GGUF      |    16K    |    GGUF        | [HuggingFace](https://huggingface.co/IEIT-Yuan/Yuan2-M32-gguf)
+| Yuan2.0-M32-GGUF-INT4 |    16K    |    GGUF        | [HuggingFace](https://huggingface.co/IEIT-Yuan/Yuan2-M32-gguf-int4/)
+##  3. Evaluation Results
+**3.1 Benchmarks 测试** 🏆
+Yuan2.0-M32 模型与多个闭源、开源模型相比，均呈现出较好的精度表现。我们评测的数据集包括：Humaneval、GSM8K、MMLU、Math、ARC-Challenge，用于考察模型在自然语言理解、知识、数学计算和推理、代码生成等任务上的能力。Yuan2.0-M32模型在所有测评任务上全面超越了Llama3-8B、Mistral-8*7B等模型，综合能力表现可以对标 Llama3-70B模型。
+| Model              |      HumanEval     |      GSM8K     |        MMLU       |         Math       |        ARC-C\*    |
+| ------------------ |  :---------------: | :------------: | :---------------: |  :---------------: |  :---------------:|
+| Llama3-70B         |     **81.7%**      |    **93%**     |       **80.3**    |         50.4%      |         93.3%     |
+| Llama3-8B          |        62.2%       |     79.6%      |       68.4%       |         30%        |         78.6%     |
+| Phi-3-medium       |        62.2%       |     91.0%      |       78.0%       |         -          |         91.6%     |
+| Phi-3-small        |        61%         |     89.6%      |       75.7%       |         -          |         90.7%     |
+| Phi-3-mini         |        58.5%       |     82.5%      |       68.8%       |         -          |         84.9%     |
+| Mistral-8*22B      |        45.1%       |     78.6%      |       77.8%       |         41,8%      |         91.3%     |
+| Mistral-8*7B       |        40.2%       |     58.4%      |       70.86%      |         28.4%      |         85.9%     |
+| **Yuan2.0-M32**    |        74.4%       |     92.5%      |       72.2%       |      **55.9%**     |       **95.8%**   |
+\* __*ARC-C*__：ARC-Challenge， ARC数据集中的高阶测试问题，需要深层的推理能力和更广泛的知识背景。
+-----
+**3.2 模型算力效率**
+| Model              |      Params (B)    |  Active Params (B) | GFLOPs/token (Inference) | GFLOPs/token (Fine-tune) | Mean Accuracy	| Mean Accuracy  GFLOPs per token (Inference) |
+| ------------------ |  :---------------: | :------------: | :---------------: |  :---------------: |  :---------------:|:---------------:|
+|                    |         参数量      |     激活参数量   | 算力消耗/token （推理阶段） | 算力消耗/token （微调阶段） |    平均测评分数     |	  模型算力效率     |
+| Llama3-70B         |         70         |     70         |       140      |       420      |      79.5       |       0.57     |
+| Llama3-8B          |         8          |     8          |       16       |       48       |      64.15      |       4.00     |
+| Mistral-8*22B      |         141        |     39         |       78       |       234      |      72.38      |       0.93     |
+| Mistral-8*7B       |         47         |    129         |       25.8     |       77,3     |      60.83      |       2.36     |
+| **Yuan2.0-M32**    |         40         |     3.7        |       7.4      |       22.2     |      79.1       |       10.69    |
+##  4. Quick Start
+**4.1  环境配置**
+我们建议使用yuan2.0-M32的最新docker[镜像文件](https://hub.docker.com/r/yuanmodel/yuan2.0:m32).
+我们可以通过下面命令启动容器：
+```bash
+docker pull yuanmodel/yuan2.0:V1-base
+docker run --gpus all --privileged --ulimit stack=68719476736 --shm-size=1000G -itd -v /path/to/yuan_2.0:/workspace/yuan_2.0 -v /path/to/dataset:/workspace/dataset -v /path/to/checkpoints:/workspace/checkpoints --name your_name yuanmodel/yuan2.0:V1-base
+docker exec -it your_name bash
 ```
+**4.2  数据预处理**
+我们提供了数据预处理的脚本，参考[数据预处理说明文档](./docs/data_process.md).
+**4.3  模型预训练**
+我们提供了用于预训练的文档和 [`example`](./examples)的脚本，具体使用方法可以参考[预训练说明文档](./docs/pretrain.md).
+**4.4  推理服务**
+-详细部署方案可以参考[vllm](https://github.com/IEIT-Yuan/Yuan2.0-M32/edit/main/vllm/README_Yuan_vllm.md)
+##  5. Statement of Agreement
+使用源2.0代码及模型需遵循 [Apache 2.0](https://github.com/xxxxxxE) 开源协议和[《源2.0模型许可协议》](./LICENSE-Yuan)，源2.0模型支持商用，不需要申请授权，请您了解并遵循，勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。
+尽管模型在训练时我们已采取措施尽力确保数据的合规性和准确性，但模型参数量巨大且受概率随机性因素影响，我们无法保证输出内容的准确性，且模型易被输入指令所误导，本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。**您将对通过使用、复制、分发和修改模型等方式利用该开源项目所产生的风险与后果，独自承担全部责任。**