Text Generation
Transformers
PyTorch
skywork_moe
custom_code
File size: 11,064 Bytes
565ab8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
452b672
565ab8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84f68dc
565ab8c
 
 
452b672
a5bf20b
 
 
565ab8c
 
 
 
452b672
 
 
 
 
 
 
 
565ab8c
452b672
84f68dc
565ab8c
 
 
 
 
 
84f68dc
565ab8c
 
 
 
 
7174d94
 
565ab8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f2fef
 
9b598e0
f7f2fef
565ab8c
 
 
 
 
84f68dc
565ab8c
452b672
 
565ab8c
 
452b672
565ab8c
 
 
452b672
 
 
 
 
 
 
 
 
 
565ab8c
 
452b672
 
565ab8c
 
84f68dc
565ab8c
 
84f68dc
565ab8c
 
 
 
 
 
452b672
 
565ab8c
 
 
452b672
565ab8c
 
 
 
 
 
 
 
 
 
452b672
565ab8c
 
 
 
 
 
 
 
 
 
 
 
452b672
565ab8c
 
 
 
47bdd71
565ab8c
 
 
 
 
 
452b672
565ab8c
 
 
 
 
 
 
 
 
531028a
565ab8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
452b672
 
565ab8c
 
 
 
 
 
452b672
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
license: other
license_name: skywork
license_link: >-
  https://github.com/SkyworkAI/Skywork/blob/main/Skywork%20Community%20License.pdf
---

<!-- <div align="center">
<h1>
  ✨Skywork
</h1>
</div> -->
<div align="center"><img src="misc/skywork_logo.jpeg" width="550"/></div>

<p align="center">
🤗 <a href="https://huggingface.co/Skywork" target="_blank">Hugging Face</a> • 🤖 <a href="https://modelscope.cn/organization/Skywork" target="_blank">ModelScope</a> • 👾 <a href="https://wisemodel.cn/organization/Skywork" target="_blank">Wisemodel</a> • 💬 <a href="https://github.com/SkyworkAI/Skywork/blob/main/misc/wechat.png?raw=true" target="_blank">WeChat</a>• 📜<a href="https://arxiv.org/pdf/2406.06563" target="_blank">Tech Report</a>
</p>

<div align="center">

[![GitHub Stars](https://img.shields.io/github/stars/SkyworkAI/Skywork-MoE)](https://github.com/SkyworkAI/Skywork-MoE/stargazers)
[![GitHub Forks](https://img.shields.io/github/forks/SkyworkAI/Skywork-MoE)](https://github.com/SkyworkAI/Skywork-MoE/fork)
</div>

<div align="center">

</div>


# Project Introduction

Skywork-MoE is a high-performance mixture-of-experts (MoE) model with 146 billion parameters, 16 experts, and 22 billion activated parameters. This model is initialized from the pre-existing dense checkpoints of our Skywork-13B model.

We introduce two innovative techniques: Gating Logit Normalization, which enhances expert diversification, and Adaptive Auxiliary Loss Coefficients, which allow for layer-specific adjustment of auxiliary loss coefficients.

Skywork-MoE demonstrates comparable or superior performance to models with more parameters or more activated parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.

# News and Updates
* 2024.6.3  We release the **Skywork-MoE-Base** model.

# Table of contents

- [☁️Download URL](#Download-URL)
- [👨‍💻Benchmark Results](#Benchmark-Results)
- [🏆Demonstration of Hugging Face Model Inference](#Demonstration-of-HuggingFace-Model-Inference)
- [📕Demonstration of vLLM Model Inference](#Demonstration-of-vLLM-Model-Inference)
- [⚠️Declaration and License Agreement](#Declaration-and-License-Agreement)
- [🤝Contact Us and Citation](#Contact-Us-and-Citation)


# Download URL

|         |                               HuggingFace Model                                |  ModelScope Model   |  Wisemodel Model  |
|:-------:|:------------------------------------------------------------------------------:|:-----------------------------:|:-----------------------------:|
| **Skywork-MoE-Base**     |     🤗 [Skywork-MoE-Base](https://huggingface.co/Skywork/Skywork-MoE-Base)     | 🤖[Skywork-MoE-Base](https://www.modelscope.cn/models/skywork/Skywork-MoE-base) | 👾[Skywork-MoE-Base](https://wisemodel.cn/models/Skywork/Skywork-MoE-base) |
| **Skywork-MoE-Base-FP8**  | 🤗 [Skywork-MoE-Base-FP8](https://huggingface.co/Skywork/Skywork-MoE-Base-FP8) | 🤖[Skywork-MoE-Base-FP8](https://www.modelscope.cn/models/skywork/Skywork-MoE-Base-FP8) | 👾[Skywork-MoE-Base-FP8](https://wisemodel.cn/models/Skywork/Skywork-MoE-Base-FP8) |
| **Skywork-MoE-Chat** |                               😊 [Coming Soon]()                               | 🤖 | 👾 |

# Benchmark Results

We evaluated Skywork-MoE-Base model on various popular benchmarks, including C-Eval, MMLU, CMMLU, GSM8K, MATH and HumanEval.
<img src="misc/skywork_moe_base_evaluation.png" alt="Image" width="600" height="280">

# Demonstration of Hugging Face Model Inference

## Base Model Inference

We can perform inference for the Skywork-MoE-Base (16x13B size) model using HuggingFace on 8xA100/A800 or higher GPU hardware configurations.

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Skywork/Skywork-MoE-Base", trust_remote_code=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("Skywork/Skywork-MoE-Base", trust_remote_code=True)

inputs = tokenizer('陕西的省会是西安', return_tensors='pt').to(model.device)
response = model.generate(inputs.input_ids, max_length=128)
print(tokenizer.decode(response.cpu()[0], skip_special_tokens=True))
"""
陕西的省会是西安。
西安,古称长安、镐京,是陕西省会、副省级市、关中平原城市群核心城市、丝绸之路起点城市、“一带一路”核心区、中国西部地区重要的中心城市,国家重要的科研、教育、工业基地。
西安是中国四大古都之一,联合国科教文组织于1981年确定的“世界历史名城”,美媒评选的世界十大古都之一。地处关中平原中部,北濒渭河,南依秦岭,八水润长安。下辖11区2县并代管西
"""

inputs = tokenizer('陕西的省会是西安,甘肃的省会是兰州,河南的省会是郑州', return_tensors='pt').to(model.device)
response = model.generate(inputs.input_ids, max_length=128)
print(tokenizer.decode(response.cpu()[0], skip_special_tokens=True))
"""
陕西的省会是西安,甘肃的省会是兰州,河南的省会是郑州,湖北的省会是武汉,湖南的省会是长沙,安徽的省会是合肥,江西的省会是南昌,江苏的省会是南京,浙江的省会是杭州,福建的省会是福州,广东的省会是广州,广西的省会是南宁,四川的省会是成都,贵州的省会是贵阳,云南的省会是昆明,山西的省会是太原,山东的省会是济南,河北的省会是石家庄,辽宁的省会是沈阳,吉林的省会是长春,黑龙江的
"""

```

## Chat Model Inference

coming soon...


# Demonstration of vLLM Model Inference

## Quickstart with vLLM

We provide a method to quickly deploy the Skywork-MoE-Base model based on vllm.

Under fp8 precision you can run Skywork-MoE-Base with just only 8*4090.

You can get the source code in [`vllm`](https://github.com/SkyworkAI/vllm)

You can get the fp8 model in [`Skywork-MoE-Base-FP8`](https://huggingface.co/Skywork/Skywork-MoE-Base-FP8)

### Based on local environment

Since pytorch only supports 4090 using fp8 precision in the nightly version, you need to install the corresponding or newer version of pytorch.

``` shell
# for cuda12.1
pip3 install --pre torch pytorch-triton --index-url https://download.pytorch.org/whl/nightly/cu121
# for cuda12.4
pip3 install --pre torch pytorch-triton --index-url https://download.pytorch.org/whl/nightly/cu124
```

Some other dependencies also need to be installed:

```shell
MAX_JOBS=8 pip3 install git+https://github.com/facebookresearch/xformers.git # need to wait for a long time
pip3 install vllm-flash-attn --no-deps
```

Then clone the [`vllm`](https://github.com/SkyworkAI/vllm) provided by skywork:

``` shell
git clone https://github.com/SkyworkAI/vllm.git
cd vllm
```

Then compile and install vllm:

``` shell
pip3 install -r requirements-build.txt
pip3 install -r requirements-cuda.txt
MAX_JOBS=8 python3 setup.py install
```

### Base on docker

You can use the docker image provided by skywork to run vllm directly:

```shell
docker pull registry.cn-wulanchabu.aliyuncs.com/triple-mu/skywork-moe-vllm:v1
```

Then start the container and set the model path and working directory.

```shell
model_path="Skywork/Skywork-MoE-Base-FP8"
workspace=${PWD}

docker run \
    --runtime nvidia \
    --gpus all \
    -it \
    --rm \
    --shm-size=1t \
    --ulimit memlock=-1 \
    --privileged=true \
    --ulimit stack=67108864 \
    --ipc=host \
    -v ${model_path}:/Skywork-MoE-Base-FP8 \
    -v ${workspace}:/workspace \
    registry.cn-wulanchabu.aliyuncs.com/triple-mu/skywork-moe-vllm:v1
```

Now, you can run the Skywork MoE model for fun!

### Text Completion

``` python
from vllm import LLM, SamplingParams

model_path = 'Skywork/Skywork-MoE-Base-FP8'
prompts = [
    "The president of the United States is",
    "The capital of France is",
]

sampling_params = SamplingParams(temperature=0.3, max_tokens=256)

llm = LLM(
    model=model_path,
    kv_cache_dtype='auto',
    tensor_parallel_size=8,
    gpu_memory_utilization=0.95, 
    enforce_eager=True,
    trust_remote_code=True,
)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```


# Declaration and License Agreement


## Declaration

We hereby declare that the Skywork model should not be used for any activities that pose a threat to national or societal security or engage in unlawful actions. Additionally, we request users not to deploy the Skywork model for internet services without appropriate security reviews and records. We hope that all users will adhere to this principle to ensure that technological advancements occur in a regulated and lawful environment.

We have done our utmost to ensure the compliance of the data used during the model's training process. However, despite our extensive efforts, due to the complexity of the model and data, there may still be unpredictable risks and issues. Therefore, if any problems arise as a result of using the Skywork open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.

## License Agreement

The community usage of Skywork model requires [Skywork Community License](https://github.com/SkyworkAI/Skywork-MoE/blob/main/Skywork%20Community%20License.pdf). The Skywork model supports commercial use. If you plan to use the Skywork model or its derivatives for commercial purposes, you must abide by terms and conditions within [Skywork Community License](https://github.com/SkyworkAI/Skywork-MoE/blob/main/Skywork%20Community%20License.pdf).

  

[《Skywork 模型社区许可协议》》]:https://github.com/SkyworkAI/Skywork-MoE/blob/main/Skywork%20模型社区许可协议.pdf


[[email protected]]: mailto:[email protected]

# Contact Us and Citation
If you find our work helpful, please feel free to cite our paper~
```
@misc{wei2024skywork,
      title={Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models}, 
      author={Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou},
      url={https://arxiv.org/pdf/2406.06563},
      year={2024},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

```
@article{zhao2024longskywork,
  title={LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models},
  author={Zhao, Liang and Wei, Tianwen and Zeng, Liang and Cheng, Cheng and Yang, Liu and Cheng, Peng and Wang, Lijie and Li, Chenxia and Wu, Xuejie and Zhu, Bo and others},
  journal={arXiv preprint arXiv:2406.00605},
  url={https://arxiv.org/abs/2406.00605},
  year={2024}
}
```