cloneQ commited on
Commit
8f8a944
·
verified ·
1 Parent(s): b24f289

Upload 927 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. data/README_zh-CN.md +304 -0
  2. data/xtuner/LICENSE +201 -0
  3. data/xtuner/MANIFEST.in +2 -0
  4. data/xtuner/README.md +302 -0
  5. data/xtuner/docs/en/Makefile +20 -0
  6. data/xtuner/docs/en/_static/css/readthedocs.css +6 -0
  7. data/xtuner/docs/en/_static/image/logo.png +0 -0
  8. data/xtuner/docs/en/acceleration/benchmark.rst +2 -0
  9. data/xtuner/docs/en/acceleration/deepspeed.rst +2 -0
  10. data/xtuner/docs/en/acceleration/flash_attn.rst +2 -0
  11. data/xtuner/docs/en/acceleration/hyper_parameters.rst +2 -0
  12. data/xtuner/docs/en/acceleration/length_grouped_sampler.rst +2 -0
  13. data/xtuner/docs/en/acceleration/pack_to_max_length.rst +2 -0
  14. data/xtuner/docs/en/acceleration/train_extreme_long_sequence.rst +2 -0
  15. data/xtuner/docs/en/acceleration/train_large_scale_dataset.rst +2 -0
  16. data/xtuner/docs/en/acceleration/varlen_flash_attn.rst +2 -0
  17. data/xtuner/docs/en/chat/agent.md +1 -0
  18. data/xtuner/docs/en/chat/llm.md +1 -0
  19. data/xtuner/docs/en/chat/lmdeploy.md +1 -0
  20. data/xtuner/docs/en/chat/vlm.md +1 -0
  21. data/xtuner/docs/en/conf.py +109 -0
  22. data/xtuner/docs/en/dpo/modify_settings.md +83 -0
  23. data/xtuner/docs/en/dpo/overview.md +27 -0
  24. data/xtuner/docs/en/dpo/quick_start.md +71 -0
  25. data/xtuner/docs/en/evaluation/hook.md +1 -0
  26. data/xtuner/docs/en/evaluation/mmbench.md +1 -0
  27. data/xtuner/docs/en/evaluation/mmlu.md +1 -0
  28. data/xtuner/docs/en/evaluation/opencompass.md +1 -0
  29. data/xtuner/docs/en/get_started/installation.md +52 -0
  30. data/xtuner/docs/en/get_started/overview.md +5 -0
  31. data/xtuner/docs/en/get_started/quickstart.md +308 -0
  32. data/xtuner/docs/en/index.rst +123 -0
  33. data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case1.rst +2 -0
  34. data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case2.rst +2 -0
  35. data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case3.rst +2 -0
  36. data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case4.rst +2 -0
  37. data/xtuner/docs/en/internevo_migration/ftdp_dataset/ftdp.rst +2 -0
  38. data/xtuner/docs/en/internevo_migration/internevo_migration.rst +2 -0
  39. data/xtuner/docs/en/make.bat +35 -0
  40. data/xtuner/docs/en/models/supported.md +1 -0
  41. data/xtuner/docs/en/notes/changelog.md +25 -0
  42. data/xtuner/docs/en/preparation/pretrained_model.rst +2 -0
  43. data/xtuner/docs/en/preparation/prompt_template.rst +2 -0
  44. data/xtuner/docs/en/reward_model/modify_settings.md +100 -0
  45. data/xtuner/docs/en/reward_model/overview.md +43 -0
  46. data/xtuner/docs/en/reward_model/preference_data.md +110 -0
  47. data/xtuner/docs/en/reward_model/quick_start.md +85 -0
  48. data/xtuner/docs/en/switch_language.md +3 -0
  49. data/xtuner/docs/en/training/custom_agent_dataset.rst +2 -0
  50. data/xtuner/docs/en/training/custom_pretrain_dataset.rst +2 -0
data/README_zh-CN.md ADDED
@@ -0,0 +1,304 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <img src="https://github.com/InternLM/lmdeploy/assets/36994684/0cf8d00f-e86b-40ba-9b54-dc8f1bc6c8d8" width="600"/>
3
+ <br /><br />
4
+
5
+ [![GitHub Repo stars](https://img.shields.io/github/stars/InternLM/xtuner?style=social)](https://github.com/InternLM/xtuner/stargazers)
6
+ [![license](https://img.shields.io/github/license/InternLM/xtuner.svg)](https://github.com/InternLM/xtuner/blob/main/LICENSE)
7
+ [![PyPI](https://img.shields.io/pypi/v/xtuner)](https://pypi.org/project/xtuner/)
8
+ [![Downloads](https://static.pepy.tech/badge/xtuner)](https://pypi.org/project/xtuner/)
9
+ [![issue resolution](https://img.shields.io/github/issues-closed-raw/InternLM/xtuner)](https://github.com/InternLM/xtuner/issues)
10
+ [![open issues](https://img.shields.io/github/issues-raw/InternLM/xtuner)](https://github.com/InternLM/xtuner/issues)
11
+
12
+ 👋 加入我们:[![Static Badge](https://img.shields.io/badge/-grey?style=social&logo=wechat&label=微信)](https://cdn.vansin.top/internlm/xtuner.jpg)
13
+ [![Static Badge](https://img.shields.io/badge/-grey?style=social&logo=twitter&label=推特)](https://twitter.com/intern_lm)
14
+ [![Static Badge](https://img.shields.io/badge/-grey?style=social&logo=discord&label=Discord)](https://discord.gg/xa29JuW87d)
15
+
16
+ 🔍 探索我们的模型:
17
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🤗%20Huggingface)](https://huggingface.co/xtuner)
18
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🤖%20ModelScope)](https://www.modelscope.cn/organization/xtuner)
19
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🧰%20OpenXLab)](https://openxlab.org.cn/usercenter/xtuner)
20
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🧠%20WiseModel)](https://www.wisemodel.cn/organization/xtuner)
21
+
22
+ [English](README.md) | 简体中文
23
+
24
+ </div>
25
+
26
+ ## 🚀 Speed Benchmark
27
+
28
+ - XTuner 与 LLaMA-Factory 在 Llama2-7B 模型上的训练效率对比
29
+
30
+ <div align=center>
31
+ <img src="https://github.com/InternLM/xtuner/assets/41630003/9c9dfdf4-1efb-4daf-84bf-7c379ae40b8b" style="width:80%">
32
+ </div>
33
+
34
+ - XTuner 与 LLaMA-Factory 在 Llama2-70B 模型上的训练效率对比
35
+
36
+ <div align=center>
37
+ <img src="https://github.com/InternLM/xtuner/assets/41630003/5ba973b8-8885-4b72-b51b-c69fa1583bdd" style="width:80%">
38
+ </div>
39
+
40
+ ## 🎉 更新
41
+ - **\[2024/07\]** 支持 [MiniCPM](xtuner/configs/minicpm/) 模型!
42
+ - **\[2024/07\]** 支持训练 [DPO](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/dpo), [ORPO](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/orpo) 还有 [Reward Model](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/reward_model) ! 并且能够支持打包数据以及序列并行功能! 请参考 [文档](https://xtuner.readthedocs.io/zh-cn/latest/dpo/overview.html) 了解更多信息。
43
+ - **\[2024/07\]** 支持 [InternLM 2.5](xtuner/configs/internlm/internlm2_5_chat_7b/) 模型!
44
+ - **\[2024/06\]** 支持 [DeepSeek V2](xtuner/configs/deepseek/deepseek_v2_chat/) models! **训练速度提升一倍!**
45
+ - **\[2024/04\]** 多模态大模型 [LLaVA-Phi-3-mini](https://huggingface.co/xtuner/llava-phi-3-mini-hf) 发布!快速开始请查阅此[文档](xtuner/configs/llava/phi3_mini_4k_instruct_clip_vit_large_p14_336)!
46
+ - **\[2024/04\]** 多模态大模型 [LLaVA-Llama-3-8B](https://huggingface.co/xtuner/llava-llama-3-8b) 和 [LLaVA-Llama-3-8B-v1.1](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1) 发布!快速开始请查阅此[文档](xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336)!
47
+ - **\[2024/04\]** 支持 [Llama 3](xtuner/configs/llama) 模型!
48
+ - **\[2024/04\]** 支持序列并行训练策略以实现语言模型超长上下文训练!\[[文档](https://github.com/InternLM/xtuner/blob/docs/docs/zh_cn/acceleration/train_extreme_long_sequence.rst)\] \[[速度基准](https://github.com/InternLM/xtuner/blob/docs/docs/zh_cn/acceleration/benchmark.rst)\]
49
+ - **\[2024/02\]** 支持 [Gemma](xtuner/configs/gemma) 模型!
50
+ - **\[2024/02\]** 支持 [Qwen1.5](xtuner/configs/qwen/qwen1_5) 模型!
51
+ - **\[2024/01\]** 支持 [InternLM2](xtuner/configs/internlm) 模型!同时,最新版的多模态大模型 [LLaVA-Internlm2-7B](https://huggingface.co/xtuner/llava-internlm2-7b) / [20B](https://huggingface.co/xtuner/llava-internlm2-20b) 发布,其表现出强大的性能!
52
+ - **\[2024/01\]** 支持 [DeepSeek-MoE](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) 模型!20GB 显存即可实现 QLoRA 微调,4x80GB 即可实现全参数微调。快速开始请查阅相关[配置文件](xtuner/configs/deepseek/)!
53
+ - **\[2023/12\]** 🔥 支持多模态模型 VLM([LLaVA-v1.5](https://github.com/haotian-liu/LLaVA))预训练和指令微调!快速开始请查阅此[文档](xtuner/configs/llava/README_zh-CN.md)!
54
+ - **\[2023/12\]** 🔥 支持 [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) 模型!快速开始请查阅此[文档](xtuner/configs/mixtral/README.md)!
55
+ - **\[2023/11\]** 支持 [ChatGLM3-6B](xtuner/configs/chatglm) 模型!
56
+ - **\[2023/10\]** 支持 [MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench) 数据集,并且微调所得大语言模型可应用至 [Lagent](https://github.com/InternLM/lagent) 框架!
57
+ - **\[2023/10\]** 优化数据处理逻辑以兼容 `system` 字段,相关细节请查阅[文档](docs/zh_cn/user_guides/dataset_format.md)!
58
+ - **\[2023/09\]** 支持 [InternLM-20B](xtuner/configs/internlm) 系列模型!
59
+ - **\[2023/09\]** 支持 [Baichuan2](xtuner/configs/baichuan) 系列模型!
60
+ - **\[2023/08\]** XTuner 正式发布!众多微调模型已上传至 [HuggingFace](https://huggingface.co/xtuner)!
61
+
62
+ ## 📖 介绍
63
+
64
+ XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
65
+
66
+ **高效**
67
+
68
+ - 支持大语言模型 LLM、多模态图文模型 VLM 的预训练及轻量级微调。XTuner 支持在 8GB 显存下微调 7B 模型,同时也支持多节点跨设备微调更大尺度模型(70B+)。
69
+ - 自动分发高性能算子(如 FlashAttention、Triton kernels 等)以加速训练吞吐。
70
+ - 兼容 [DeepSpeed](https://github.com/microsoft/DeepSpeed) 🚀,轻松应用各种 ZeRO 训练优化策略。
71
+
72
+ **灵活**
73
+
74
+ - 支持多种大语言模型,包括但不限于 [InternLM](https://huggingface.co/internlm)、[Mixtral-8x7B](https://huggingface.co/mistralai)、[Llama 2](https://huggingface.co/meta-llama)、[ChatGLM](https://huggingface.co/THUDM)、[Qwen](https://huggingface.co/Qwen)、[Baichuan](https://huggingface.co/baichuan-inc)。
75
+ - 支持多模态图文模型 LLaVA 的预训练与微调。利用 XTuner 训得模型 [LLaVA-InternLM2-20B](https://huggingface.co/xtuner/llava-internlm2-20b) 表现优异。
76
+ - 精心设计的数据管道,兼容任意数据格式,开源数据或自定义数据皆可快速上手。
77
+ - 支持 [QLoRA](http://arxiv.org/abs/2305.14314)、[LoRA](http://arxiv.org/abs/2106.09685)、全量参数微调等多种微调算法,支撑用户根据具体需求作出最优选择。
78
+
79
+ **全能**
80
+
81
+ - 支持增量预训练、指令微调与 Agent 微调。
82
+ - 预定义众多开源对话模版,支持与开源或训练所得模型进行对话。
83
+ - 训练所得模型可无缝接入部署工具库 [LMDeploy](https://github.com/InternLM/lmdeploy)、大规模评测工具库 [OpenCompass](https://github.com/open-compass/opencompass) 及 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)。
84
+
85
+ ## 🔥 支持列表
86
+
87
+ <table>
88
+ <tbody>
89
+ <tr align="center" valign="middle">
90
+ <td>
91
+ <b>模型</b>
92
+ </td>
93
+ <td>
94
+ <b>数据集</b>
95
+ </td>
96
+ <td>
97
+ <b>数据格式</b>
98
+ </td>
99
+ <td>
100
+ <b>微调算法</b>
101
+ </td>
102
+ </tr>
103
+ <tr valign="top">
104
+ <td align="left" valign="top">
105
+ <ul>
106
+ <li><a href="https://huggingface.co/internlm">InternLM 2 / 2.5</a></li>
107
+ <li><a href="https://huggingface.co/meta-llama">Llama 2 / 3</a></li>
108
+ <li><a href="https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3">Phi-3</a></li>
109
+ <li><a href="https://huggingface.co/THUDM/chatglm2-6b">ChatGLM2</a></li>
110
+ <li><a href="https://huggingface.co/THUDM/chatglm3-6b">ChatGLM3</a></li>
111
+ <li><a href="https://huggingface.co/Qwen/Qwen-7B">Qwen</a></li>
112
+ <li><a href="https://huggingface.co/baichuan-inc/Baichuan2-7B-Base">Baichuan2</a></li>
113
+ <li><a href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1">Mixtral</a></li>
114
+ <li><a href="https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat">DeepSeek V2</a></li>
115
+ <li><a href="https://huggingface.co/google">Gemma</a></li>
116
+ <li><a href="https://huggingface.co/openbmb">MiniCPM</a></li>
117
+ <li>...</li>
118
+ </ul>
119
+ </td>
120
+ <td>
121
+ <ul>
122
+ <li><a href="https://modelscope.cn/datasets/damo/MSAgent-Bench">MSAgent-Bench</a></li>
123
+ <li><a href="https://huggingface.co/datasets/fnlp/moss-003-sft-data">MOSS-003-SFT</a> 🔧</li>
124
+ <li><a href="https://huggingface.co/datasets/tatsu-lab/alpaca">Alpaca en</a> / <a href="https://huggingface.co/datasets/silk-road/alpaca-data-gpt4-chinese">zh</a></li>
125
+ <li><a href="https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k">WizardLM</a></li>
126
+ <li><a href="https://huggingface.co/datasets/timdettmers/openassistant-guanaco">oasst1</a></li>
127
+ <li><a href="https://huggingface.co/datasets/garage-bAInd/Open-Platypus">Open-Platypus</a></li>
128
+ <li><a href="https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K">Code Alpaca</a></li>
129
+ <li><a href="https://huggingface.co/datasets/burkelibbey/colors">Colorist</a> 🎨</li>
130
+ <li><a href="https://github.com/WangRongsheng/ChatGenTitle">Arxiv GenTitle</a></li>
131
+ <li><a href="https://github.com/LiuHC0428/LAW-GPT">Chinese Law</a></li>
132
+ <li><a href="https://huggingface.co/datasets/Open-Orca/OpenOrca">OpenOrca</a></li>
133
+ <li><a href="https://huggingface.co/datasets/shibing624/medical">Medical Dialogue</a></li>
134
+ <li>...</li>
135
+ </ul>
136
+ </td>
137
+ <td>
138
+ <ul>
139
+ <li><a href="docs/zh_cn/user_guides/incremental_pretraining.md">Incremental Pre-training</a> </li>
140
+ <li><a href="docs/zh_cn/user_guides/single_turn_conversation.md">Single-turn Conversation SFT</a> </li>
141
+ <li><a href="docs/zh_cn/user_guides/multi_turn_conversation.md">Multi-turn Conversation SFT</a> </li>
142
+ </ul>
143
+ </td>
144
+ <td>
145
+ <ul>
146
+ <li><a href="http://arxiv.org/abs/2305.14314">QLoRA</a></li>
147
+ <li><a href="http://arxiv.org/abs/2106.09685">LoRA</a></li>
148
+ <li>全量参数微调</li>
149
+ <li><a href="https://arxiv.org/abs/2305.18290">DPO</a></li>
150
+ <li><a href="https://arxiv.org/abs/2403.07691">ORPO</a></li>
151
+ <li>Reward Model</a></li>
152
+ </ul>
153
+ </td>
154
+ </tr>
155
+ </tbody>
156
+ </table>
157
+
158
+ ## 🛠️ 快速上手
159
+
160
+ ### 安装
161
+
162
+ - 推荐使用 conda 先构建一个 Python-3.10 的虚拟环境
163
+
164
+ ```bash
165
+ conda create --name xtuner-env python=3.10 -y
166
+ conda activate xtuner-env
167
+ ```
168
+
169
+ - 通过 pip 安装 XTuner:
170
+
171
+ ```shell
172
+ pip install -U xtuner
173
+ ```
174
+
175
+ 亦可集成 DeepSpeed 安装:
176
+
177
+ ```shell
178
+ pip install -U 'xtuner[deepspeed]'
179
+ ```
180
+
181
+ - 从源码安装 XTuner:
182
+
183
+ ```shell
184
+ git clone https://github.com/InternLM/xtuner.git
185
+ cd xtuner
186
+ pip install -e '.[all]'
187
+ ```
188
+
189
+ ### 微调
190
+
191
+ XTuner 支持微调大语言模型。数据集预处理指南请查阅[文档](./docs/zh_cn/user_guides/dataset_prepare.md)。
192
+
193
+ - **步骤 0**,准备配置文件。XTuner 提供多个开箱即用的配置文件,用户可以通过下列命令查看:
194
+
195
+ ```shell
196
+ xtuner list-cfg
197
+ ```
198
+
199
+ 或者,如果所提供的配置文件不能满足使用需求,请导出所提供的配置文件并进行相应更改:
200
+
201
+ ```shell
202
+ xtuner copy-cfg ${CONFIG_NAME} ${SAVE_PATH}
203
+ vi ${SAVE_PATH}/${CONFIG_NAME}_copy.py
204
+ ```
205
+
206
+ - **步骤 1**,开始微调。
207
+
208
+ ```shell
209
+ xtuner train ${CONFIG_NAME_OR_PATH}
210
+ ```
211
+
212
+ 例如,我们可以利用 QLoRA 算法在 oasst1 数据集上微调 InternLM2.5-Chat-7B:
213
+
214
+ ```shell
215
+ # 单卡
216
+ xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
217
+ # 多卡
218
+ (DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
219
+ (SLURM) srun ${SRUN_ARGS} xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2
220
+ ```
221
+
222
+ - `--deepspeed` 表示使用 [DeepSpeed](https://github.com/microsoft/DeepSpeed) 🚀 来优化训练过程。XTuner 内置了多种策略,包括 ZeRO-1、ZeRO-2、ZeRO-3 等。如果用户期望关闭此功能,请直接移除此参数。
223
+
224
+ - 更多示例,请查阅[文档](./docs/zh_cn/user_guides/finetune.md)。
225
+
226
+ - **步骤 2**,将保存的 PTH 模型(如果使用的DeepSpeed,则将会是一个文件夹)转换为 HuggingFace 模型:
227
+
228
+ ```shell
229
+ xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}
230
+ ```
231
+
232
+ ### 对话
233
+
234
+ XTuner 提供与大语言模型对话的工具。
235
+
236
+ ```shell
237
+ xtuner chat ${NAME_OR_PATH_TO_LLM} --adapter {NAME_OR_PATH_TO_ADAPTER} [optional arguments]
238
+ ```
239
+
240
+ 例如:
241
+
242
+ 与 InternLM2.5-Chat-7B 对话:
243
+
244
+ ```shell
245
+ xtuner chat internlm/internlm2-chat-7b --prompt-template internlm2_chat
246
+ ```
247
+
248
+ 更多示例,请查阅[文档](./docs/zh_cn/user_guides/chat.md)。
249
+
250
+ ### 部署
251
+
252
+ - **步骤 0**,将 HuggingFace adapter 合并到大语言模型:
253
+
254
+ ```shell
255
+ xtuner convert merge \
256
+ ${NAME_OR_PATH_TO_LLM} \
257
+ ${NAME_OR_PATH_TO_ADAPTER} \
258
+ ${SAVE_PATH} \
259
+ --max-shard-size 2GB
260
+ ```
261
+
262
+ - **步骤 1**,使用任意推理框架部署微调后的大语言模型,例如 [LMDeploy](https://github.com/InternLM/lmdeploy) 🚀:
263
+
264
+ ```shell
265
+ pip install lmdeploy
266
+ python -m lmdeploy.pytorch.chat ${NAME_OR_PATH_TO_LLM} \
267
+ --max_new_tokens 256 \
268
+ --temperture 0.8 \
269
+ --top_p 0.95 \
270
+ --seed 0
271
+ ```
272
+
273
+ 🔥 追求速度更快、显存占用更低的推理?欢迎体验 [LMDeploy](https://github.com/InternLM/lmdeploy) 提供的 4-bit 量化!使用指南请见[文档](https://github.com/InternLM/lmdeploy/tree/main#quantization)。
274
+
275
+ ### 评测
276
+
277
+ - 推荐使用一站式平台 [OpenCompass](https://github.com/InternLM/opencompass) 来评测大语言模型,其目前已涵盖 50+ 数据集的约 30 万条题目。
278
+
279
+ ## 🤝 贡献指南
280
+
281
+ 我们感谢所有的贡献者为改进和提升 XTuner 所作出的努力。请参考[贡献指南](.github/CONTRIBUTING.md)来了解参与项目贡献的相关指引。
282
+
283
+ ## 🎖️ 致谢
284
+
285
+ - [Llama 2](https://github.com/facebookresearch/llama)
286
+ - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
287
+ - [QLoRA](https://github.com/artidoro/qlora)
288
+ - [LMDeploy](https://github.com/InternLM/lmdeploy)
289
+ - [LLaVA](https://github.com/haotian-liu/LLaVA)
290
+
291
+ ## 🖊️ 引用
292
+
293
+ ```bibtex
294
+ @misc{2023xtuner,
295
+ title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
296
+ author={XTuner Contributors},
297
+ howpublished = {\url{https://github.com/InternLM/xtuner}},
298
+ year={2023}
299
+ }
300
+ ```
301
+
302
+ ## 开源许可证
303
+
304
+ 该项目采用 [Apache License 2.0 开源许可证](LICENSE)。同时,请遵守所使用的模型与数据集的许可证。
data/xtuner/LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
data/xtuner/MANIFEST.in ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ recursive-include xtuner/configs *.py *.yml *.json
2
+ recursive-include xtuner/tools *.sh *.py
data/xtuner/README.md ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <img src="https://github.com/InternLM/lmdeploy/assets/36994684/0cf8d00f-e86b-40ba-9b54-dc8f1bc6c8d8" width="600"/>
3
+ <br /><br />
4
+
5
+ [![GitHub Repo stars](https://img.shields.io/github/stars/InternLM/xtuner?style=social)](https://github.com/InternLM/xtuner/stargazers)
6
+ [![license](https://img.shields.io/github/license/InternLM/xtuner.svg)](https://github.com/InternLM/xtuner/blob/main/LICENSE)
7
+ [![PyPI](https://img.shields.io/pypi/v/xtuner)](https://pypi.org/project/xtuner/)
8
+ [![Downloads](https://static.pepy.tech/badge/xtuner)](https://pypi.org/project/xtuner/)
9
+ [![issue resolution](https://img.shields.io/github/issues-closed-raw/InternLM/xtuner)](https://github.com/InternLM/xtuner/issues)
10
+ [![open issues](https://img.shields.io/github/issues-raw/InternLM/xtuner)](https://github.com/InternLM/xtuner/issues)
11
+
12
+ 👋 join us on [![Static Badge](https://img.shields.io/badge/-grey?style=social&logo=wechat&label=WeChat)](https://cdn.vansin.top/internlm/xtuner.jpg)
13
+ [![Static Badge](https://img.shields.io/badge/-grey?style=social&logo=twitter&label=Twitter)](https://twitter.com/intern_lm)
14
+ [![Static Badge](https://img.shields.io/badge/-grey?style=social&logo=discord&label=Discord)](https://discord.gg/xa29JuW87d)
15
+
16
+ 🔍 Explore our models on
17
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🤗%20Huggingface)](https://huggingface.co/xtuner)
18
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🤖%20ModelScope)](https://www.modelscope.cn/organization/xtuner)
19
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🧰%20OpenXLab)](https://openxlab.org.cn/usercenter/xtuner)
20
+ [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🧠%20WiseModel)](https://www.wisemodel.cn/organization/xtuner)
21
+
22
+ English | [简体中文](README_zh-CN.md)
23
+
24
+ </div>
25
+
26
+ ## 🚀 Speed Benchmark
27
+
28
+ - Llama2 7B Training Speed
29
+
30
+ <div align=center>
31
+ <img src="https://github.com/InternLM/xtuner/assets/41630003/9c9dfdf4-1efb-4daf-84bf-7c379ae40b8b" style="width:80%">
32
+ </div>
33
+
34
+ - Llama2 70B Training Speed
35
+
36
+ <div align=center>
37
+ <img src="https://github.com/InternLM/xtuner/assets/41630003/5ba973b8-8885-4b72-b51b-c69fa1583bdd" style="width:80%">
38
+ </div>
39
+
40
+ ## 🎉 News
41
+ - **\[2024/07\]** Support [MiniCPM](xtuner/configs/minicpm/) models!
42
+ - **\[2024/07\]** Support [DPO](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/dpo), [ORPO](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/orpo) and [Reward Model](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/reward_model) training with packed data and sequence parallel! See [documents](https://xtuner.readthedocs.io/en/latest/dpo/overview.html) for more details.
43
+ - **\[2024/07\]** Support [InternLM 2.5](xtuner/configs/internlm/internlm2_5_chat_7b/) models!
44
+ - **\[2024/06\]** Support [DeepSeek V2](xtuner/configs/deepseek/deepseek_v2_chat/) models! **2x faster!**
45
+ - **\[2024/04\]** [LLaVA-Phi-3-mini](https://huggingface.co/xtuner/llava-phi-3-mini-hf) is released! Click [here](xtuner/configs/llava/phi3_mini_4k_instruct_clip_vit_large_p14_336) for details!
46
+ - **\[2024/04\]** [LLaVA-Llama-3-8B](https://huggingface.co/xtuner/llava-llama-3-8b) and [LLaVA-Llama-3-8B-v1.1](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1) are released! Click [here](xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336) for details!
47
+ - **\[2024/04\]** Support [Llama 3](xtuner/configs/llama) models!
48
+ - **\[2024/04\]** Support Sequence Parallel for enabling highly efficient and scalable LLM training with extremely long sequence lengths! \[[Usage](https://github.com/InternLM/xtuner/blob/docs/docs/zh_cn/acceleration/train_extreme_long_sequence.rst)\] \[[Speed Benchmark](https://github.com/InternLM/xtuner/blob/docs/docs/zh_cn/acceleration/benchmark.rst)\]
49
+ - **\[2024/02\]** Support [Gemma](xtuner/configs/gemma) models!
50
+ - **\[2024/02\]** Support [Qwen1.5](xtuner/configs/qwen/qwen1_5) models!
51
+ - **\[2024/01\]** Support [InternLM2](xtuner/configs/internlm) models! The latest VLM [LLaVA-Internlm2-7B](https://huggingface.co/xtuner/llava-internlm2-7b) / [20B](https://huggingface.co/xtuner/llava-internlm2-20b) models are released, with impressive performance!
52
+ - **\[2024/01\]** Support [DeepSeek-MoE](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) models! 20GB GPU memory is enough for QLoRA fine-tuning, and 4x80GB for full-parameter fine-tuning. Click [here](xtuner/configs/deepseek/) for details!
53
+ - **\[2023/12\]** 🔥 Support multi-modal VLM pretraining and fine-tuning with [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) architecture! Click [here](xtuner/configs/llava/README.md) for details!
54
+ - **\[2023/12\]** 🔥 Support [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) models! Click [here](xtuner/configs/mixtral/README.md) for details!
55
+ - **\[2023/11\]** Support [ChatGLM3-6B](xtuner/configs/chatglm) model!
56
+ - **\[2023/10\]** Support [MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench) dataset, and the fine-tuned LLMs can be applied by [Lagent](https://github.com/InternLM/lagent)!
57
+ - **\[2023/10\]** Optimize the data processing to accommodate `system` context. More information can be found on [Docs](docs/en/user_guides/dataset_format.md)!
58
+ - **\[2023/09\]** Support [InternLM-20B](xtuner/configs/internlm) models!
59
+ - **\[2023/09\]** Support [Baichuan2](xtuner/configs/baichuan) models!
60
+ - **\[2023/08\]** XTuner is released, with multiple fine-tuned adapters on [Hugging Face](https://huggingface.co/xtuner).
61
+
62
+ ## 📖 Introduction
63
+
64
+ XTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models.
65
+
66
+ **Efficient**
67
+
68
+ - Support LLM, VLM pre-training / fine-tuning on almost all GPUs. XTuner is capable of fine-tuning 7B LLM on a single 8GB GPU, as well as multi-node fine-tuning of models exceeding 70B.
69
+ - Automatically dispatch high-performance operators such as FlashAttention and Triton kernels to increase training throughput.
70
+ - Compatible with [DeepSpeed](https://github.com/microsoft/DeepSpeed) 🚀, easily utilizing a variety of ZeRO optimization techniques.
71
+
72
+ **Flexible**
73
+
74
+ - Support various LLMs ([InternLM](https://huggingface.co/internlm), [Mixtral-8x7B](https://huggingface.co/mistralai), [Llama 2](https://huggingface.co/meta-llama), [ChatGLM](https://huggingface.co/THUDM), [Qwen](https://huggingface.co/Qwen), [Baichuan](https://huggingface.co/baichuan-inc), ...).
75
+ - Support VLM ([LLaVA](https://github.com/haotian-liu/LLaVA)). The performance of [LLaVA-InternLM2-20B](https://huggingface.co/xtuner/llava-internlm2-20b) is outstanding.
76
+ - Well-designed data pipeline, accommodating datasets in any format, including but not limited to open-source and custom formats.
77
+ - Support various training algorithms ([QLoRA](http://arxiv.org/abs/2305.14314), [LoRA](http://arxiv.org/abs/2106.09685), full-parameter fune-tune), allowing users to choose the most suitable solution for their requirements.
78
+
79
+ **Full-featured**
80
+
81
+ - Support continuous pre-training, instruction fine-tuning, and agent fine-tuning.
82
+ - Support chatting with large models with pre-defined templates.
83
+ - The output models can seamlessly integrate with deployment and server toolkit ([LMDeploy](https://github.com/InternLM/lmdeploy)), and large-scale evaluation toolkit ([OpenCompass](https://github.com/open-compass/opencompass), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)).
84
+
85
+ ## 🔥 Supports
86
+
87
+ <table>
88
+ <tbody>
89
+ <tr align="center" valign="middle">
90
+ <td>
91
+ <b>Models</b>
92
+ </td>
93
+ <td>
94
+ <b>SFT Datasets</b>
95
+ </td>
96
+ <td>
97
+ <b>Data Pipelines</b>
98
+ </td>
99
+ <td>
100
+ <b>Algorithms</b>
101
+ </td>
102
+ </tr>
103
+ <tr valign="top">
104
+ <td align="left" valign="top">
105
+ <ul>
106
+ <li><a href="https://huggingface.co/internlm">InternLM2 / 2.5</a></li>
107
+ <li><a href="https://huggingface.co/meta-llama">Llama 2 / 3</a></li>
108
+ <li><a href="https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3">Phi-3</a></li>
109
+ <li><a href="https://huggingface.co/THUDM/chatglm2-6b">ChatGLM2</a></li>
110
+ <li><a href="https://huggingface.co/THUDM/chatglm3-6b">ChatGLM3</a></li>
111
+ <li><a href="https://huggingface.co/Qwen/Qwen-7B">Qwen</a></li>
112
+ <li><a href="https://huggingface.co/baichuan-inc/Baichuan2-7B-Base">Baichuan2</a></li>
113
+ <li><a href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1">Mixtral</a></li>
114
+ <li><a href="https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat">DeepSeek V2</a></li>
115
+ <li><a href="https://huggingface.co/google">Gemma</a></li>
116
+ <li><a href="https://huggingface.co/openbmb">MiniCPM</a></li>
117
+ <li>...</li>
118
+ </ul>
119
+ </td>
120
+ <td>
121
+ <ul>
122
+ <li><a href="https://modelscope.cn/datasets/damo/MSAgent-Bench">MSAgent-Bench</a></li>
123
+ <li><a href="https://huggingface.co/datasets/fnlp/moss-003-sft-data">MOSS-003-SFT</a> 🔧</li>
124
+ <li><a href="https://huggingface.co/datasets/tatsu-lab/alpaca">Alpaca en</a> / <a href="https://huggingface.co/datasets/silk-road/alpaca-data-gpt4-chinese">zh</a></li>
125
+ <li><a href="https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k">WizardLM</a></li>
126
+ <li><a href="https://huggingface.co/datasets/timdettmers/openassistant-guanaco">oasst1</a></li>
127
+ <li><a href="https://huggingface.co/datasets/garage-bAInd/Open-Platypus">Open-Platypus</a></li>
128
+ <li><a href="https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K">Code Alpaca</a></li>
129
+ <li><a href="https://huggingface.co/datasets/burkelibbey/colors">Colorist</a> 🎨</li>
130
+ <li><a href="https://github.com/WangRongsheng/ChatGenTitle">Arxiv GenTitle</a></li>
131
+ <li><a href="https://github.com/LiuHC0428/LAW-GPT">Chinese Law</a></li>
132
+ <li><a href="https://huggingface.co/datasets/Open-Orca/OpenOrca">OpenOrca</a></li>
133
+ <li><a href="https://huggingface.co/datasets/shibing624/medical">Medical Dialogue</a></li>
134
+ <li>...</li>
135
+ </ul>
136
+ </td>
137
+ <td>
138
+ <ul>
139
+ <li><a href="docs/zh_cn/user_guides/incremental_pretraining.md">Incremental Pre-training</a> </li>
140
+ <li><a href="docs/zh_cn/user_guides/single_turn_conversation.md">Single-turn Conversation SFT</a> </li>
141
+ <li><a href="docs/zh_cn/user_guides/multi_turn_conversation.md">Multi-turn Conversation SFT</a> </li>
142
+ </ul>
143
+ </td>
144
+ <td>
145
+ <ul>
146
+ <li><a href="http://arxiv.org/abs/2305.14314">QLoRA</a></li>
147
+ <li><a href="http://arxiv.org/abs/2106.09685">LoRA</a></li>
148
+ <li>Full parameter fine-tune</li>
149
+ <li><a href="https://arxiv.org/abs/2305.18290">DPO</a></li>
150
+ <li><a href="https://arxiv.org/abs/2403.07691">ORPO</a></li>
151
+ <li>Reward Model</a></li>
152
+ </ul>
153
+ </td>
154
+ </tr>
155
+ </tbody>
156
+ </table>
157
+
158
+ ## 🛠️ Quick Start
159
+
160
+ ### Installation
161
+
162
+ - It is recommended to build a Python-3.10 virtual environment using conda
163
+
164
+ ```bash
165
+ conda create --name xtuner-env python=3.10 -y
166
+ conda activate xtuner-env
167
+ ```
168
+
169
+ - Install XTuner via pip
170
+
171
+ ```shell
172
+ pip install -U xtuner
173
+ ```
174
+
175
+ or with DeepSpeed integration
176
+
177
+ ```shell
178
+ pip install -U 'xtuner[deepspeed]'
179
+ ```
180
+
181
+ - Install XTuner from source
182
+
183
+ ```shell
184
+ git clone https://github.com/InternLM/xtuner.git
185
+ cd xtuner
186
+ pip install -e '.[all]'
187
+ ```
188
+
189
+ ### Fine-tune
190
+
191
+ XTuner supports the efficient fine-tune (*e.g.*, QLoRA) for LLMs. Dataset prepare guides can be found on [dataset_prepare.md](./docs/en/user_guides/dataset_prepare.md).
192
+
193
+ - **Step 0**, prepare the config. XTuner provides many ready-to-use configs and we can view all configs by
194
+
195
+ ```shell
196
+ xtuner list-cfg
197
+ ```
198
+
199
+ Or, if the provided configs cannot meet the requirements, please copy the provided config to the specified directory and make specific modifications by
200
+
201
+ ```shell
202
+ xtuner copy-cfg ${CONFIG_NAME} ${SAVE_PATH}
203
+ vi ${SAVE_PATH}/${CONFIG_NAME}_copy.py
204
+ ```
205
+
206
+ - **Step 1**, start fine-tuning.
207
+
208
+ ```shell
209
+ xtuner train ${CONFIG_NAME_OR_PATH}
210
+ ```
211
+
212
+ For example, we can start the QLoRA fine-tuning of InternLM2.5-Chat-7B with oasst1 dataset by
213
+
214
+ ```shell
215
+ # On a single GPU
216
+ xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
217
+ # On multiple GPUs
218
+ (DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
219
+ (SLURM) srun ${SRUN_ARGS} xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2
220
+ ```
221
+
222
+ - `--deepspeed` means using [DeepSpeed](https://github.com/microsoft/DeepSpeed) 🚀 to optimize the training. XTuner comes with several integrated strategies including ZeRO-1, ZeRO-2, and ZeRO-3. If you wish to disable this feature, simply remove this argument.
223
+
224
+ - For more examples, please see [finetune.md](./docs/en/user_guides/finetune.md).
225
+
226
+ - **Step 2**, convert the saved PTH model (if using DeepSpeed, it will be a directory) to Hugging Face model, by
227
+
228
+ ```shell
229
+ xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}
230
+ ```
231
+
232
+ ### Chat
233
+
234
+ XTuner provides tools to chat with pretrained / fine-tuned LLMs.
235
+
236
+ ```shell
237
+ xtuner chat ${NAME_OR_PATH_TO_LLM} --adapter {NAME_OR_PATH_TO_ADAPTER} [optional arguments]
238
+ ```
239
+
240
+ For example, we can start the chat with InternLM2.5-Chat-7B :
241
+
242
+ ```shell
243
+ xtuner chat internlm/internlm2_5-chat-7b --prompt-template internlm2_chat
244
+ ```
245
+
246
+ For more examples, please see [chat.md](./docs/en/user_guides/chat.md).
247
+
248
+ ### Deployment
249
+
250
+ - **Step 0**, merge the Hugging Face adapter to pretrained LLM, by
251
+
252
+ ```shell
253
+ xtuner convert merge \
254
+ ${NAME_OR_PATH_TO_LLM} \
255
+ ${NAME_OR_PATH_TO_ADAPTER} \
256
+ ${SAVE_PATH} \
257
+ --max-shard-size 2GB
258
+ ```
259
+
260
+ - **Step 1**, deploy fine-tuned LLM with any other framework, such as [LMDeploy](https://github.com/InternLM/lmdeploy) 🚀.
261
+
262
+ ```shell
263
+ pip install lmdeploy
264
+ python -m lmdeploy.pytorch.chat ${NAME_OR_PATH_TO_LLM} \
265
+ --max_new_tokens 256 \
266
+ --temperture 0.8 \
267
+ --top_p 0.95 \
268
+ --seed 0
269
+ ```
270
+
271
+ 🔥 Seeking efficient inference with less GPU memory? Try 4-bit quantization from [LMDeploy](https://github.com/InternLM/lmdeploy)! For more details, see [here](https://github.com/InternLM/lmdeploy/tree/main#quantization).
272
+
273
+ ### Evaluation
274
+
275
+ - We recommend using [OpenCompass](https://github.com/InternLM/opencompass), a comprehensive and systematic LLM evaluation library, which currently supports 50+ datasets with about 300,000 questions.
276
+
277
+ ## 🤝 Contributing
278
+
279
+ We appreciate all contributions to XTuner. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
280
+
281
+ ## 🎖️ Acknowledgement
282
+
283
+ - [Llama 2](https://github.com/facebookresearch/llama)
284
+ - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
285
+ - [QLoRA](https://github.com/artidoro/qlora)
286
+ - [LMDeploy](https://github.com/InternLM/lmdeploy)
287
+ - [LLaVA](https://github.com/haotian-liu/LLaVA)
288
+
289
+ ## 🖊️ Citation
290
+
291
+ ```bibtex
292
+ @misc{2023xtuner,
293
+ title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
294
+ author={XTuner Contributors},
295
+ howpublished = {\url{https://github.com/InternLM/xtuner}},
296
+ year={2023}
297
+ }
298
+ ```
299
+
300
+ ## License
301
+
302
+ This project is released under the [Apache License 2.0](LICENSE). Please also adhere to the Licenses of models and datasets being used.
data/xtuner/docs/en/Makefile ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Minimal makefile for Sphinx documentation
2
+ #
3
+
4
+ # You can set these variables from the command line, and also
5
+ # from the environment for the first two.
6
+ SPHINXOPTS ?=
7
+ SPHINXBUILD ?= sphinx-build
8
+ SOURCEDIR = .
9
+ BUILDDIR = _build
10
+
11
+ # Put it first so that "make" without argument is like "make help".
12
+ help:
13
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14
+
15
+ .PHONY: help Makefile
16
+
17
+ # Catch-all target: route all unknown targets to Sphinx using the new
18
+ # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19
+ %: Makefile
20
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
data/xtuner/docs/en/_static/css/readthedocs.css ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ .header-logo {
2
+ background-image: url("../image/logo.png");
3
+ background-size: 177px 40px;
4
+ height: 40px;
5
+ width: 177px;
6
+ }
data/xtuner/docs/en/_static/image/logo.png ADDED
data/xtuner/docs/en/acceleration/benchmark.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Benchmark
2
+ =========
data/xtuner/docs/en/acceleration/deepspeed.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ DeepSpeed
2
+ =========
data/xtuner/docs/en/acceleration/flash_attn.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Flash Attention
2
+ ===============
data/xtuner/docs/en/acceleration/hyper_parameters.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ HyperParameters
2
+ ===============
data/xtuner/docs/en/acceleration/length_grouped_sampler.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Length Grouped Sampler
2
+ ======================
data/xtuner/docs/en/acceleration/pack_to_max_length.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Pack to Max Length
2
+ ==================
data/xtuner/docs/en/acceleration/train_extreme_long_sequence.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Train Extreme Long Sequence
2
+ ===========================
data/xtuner/docs/en/acceleration/train_large_scale_dataset.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Train Large-scale Dataset
2
+ =========================
data/xtuner/docs/en/acceleration/varlen_flash_attn.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Varlen Flash Attention
2
+ ======================
data/xtuner/docs/en/chat/agent.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Chat with Agent
data/xtuner/docs/en/chat/llm.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Chat with LLM
data/xtuner/docs/en/chat/lmdeploy.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Accelerate chat by LMDeploy
data/xtuner/docs/en/chat/vlm.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Chat with VLM
data/xtuner/docs/en/conf.py ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration file for the Sphinx documentation builder.
2
+ #
3
+ # This file only contains a selection of the most common options. For a full
4
+ # list see the documentation:
5
+ # https://www.sphinx-doc.org/en/master/usage/configuration.html
6
+
7
+ # -- Path setup --------------------------------------------------------------
8
+
9
+ # If extensions (or modules to document with autodoc) are in another directory,
10
+ # add these directories to sys.path here. If the directory is relative to the
11
+ # documentation root, use os.path.abspath to make it absolute, like shown here.
12
+
13
+ import os
14
+ import sys
15
+
16
+ from sphinx.ext import autodoc
17
+
18
+ sys.path.insert(0, os.path.abspath('../..'))
19
+
20
+ # -- Project information -----------------------------------------------------
21
+
22
+ project = 'XTuner'
23
+ copyright = '2024, XTuner Contributors'
24
+ author = 'XTuner Contributors'
25
+
26
+ # The full version, including alpha/beta/rc tags
27
+ version_file = '../../xtuner/version.py'
28
+ with open(version_file) as f:
29
+ exec(compile(f.read(), version_file, 'exec'))
30
+ __version__ = locals()['__version__']
31
+ # The short X.Y version
32
+ version = __version__
33
+ # The full version, including alpha/beta/rc tags
34
+ release = __version__
35
+
36
+ # -- General configuration ---------------------------------------------------
37
+
38
+ # Add any Sphinx extension module names here, as strings. They can be
39
+ # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
40
+ # ones.
41
+ extensions = [
42
+ 'sphinx.ext.napoleon',
43
+ 'sphinx.ext.viewcode',
44
+ 'sphinx.ext.intersphinx',
45
+ 'sphinx_copybutton',
46
+ 'sphinx.ext.autodoc',
47
+ 'sphinx.ext.autosummary',
48
+ 'myst_parser',
49
+ 'sphinxarg.ext',
50
+ ]
51
+
52
+ # Add any paths that contain templates here, relative to this directory.
53
+ templates_path = ['_templates']
54
+
55
+ # List of patterns, relative to source directory, that match files and
56
+ # directories to ignore when looking for source files.
57
+ # This pattern also affects html_static_path and html_extra_path.
58
+ exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
59
+
60
+ # Exclude the prompt "$" when copying code
61
+ copybutton_prompt_text = r'\$ '
62
+ copybutton_prompt_is_regexp = True
63
+
64
+ language = 'en'
65
+
66
+ # -- Options for HTML output -------------------------------------------------
67
+
68
+ # The theme to use for HTML and HTML Help pages. See the documentation for
69
+ # a list of builtin themes.
70
+ #
71
+ html_theme = 'sphinx_book_theme'
72
+ html_logo = '_static/image/logo.png'
73
+ html_theme_options = {
74
+ 'path_to_docs': 'docs/en',
75
+ 'repository_url': 'https://github.com/InternLM/xtuner',
76
+ 'use_repository_button': True,
77
+ }
78
+ # Add any paths that contain custom static files (such as style sheets) here,
79
+ # relative to this directory. They are copied after the builtin static files,
80
+ # so a file named "default.css" will overwrite the builtin "default.css".
81
+ # html_static_path = ['_static']
82
+
83
+ # Mock out external dependencies here.
84
+ autodoc_mock_imports = [
85
+ 'cpuinfo',
86
+ 'torch',
87
+ 'transformers',
88
+ 'psutil',
89
+ 'prometheus_client',
90
+ 'sentencepiece',
91
+ 'vllm.cuda_utils',
92
+ 'vllm._C',
93
+ 'numpy',
94
+ 'tqdm',
95
+ ]
96
+
97
+
98
+ class MockedClassDocumenter(autodoc.ClassDocumenter):
99
+ """Remove note about base class when a class is derived from object."""
100
+
101
+ def add_line(self, line: str, source: str, *lineno: int) -> None:
102
+ if line == ' Bases: :py:class:`object`':
103
+ return
104
+ super().add_line(line, source, *lineno)
105
+
106
+
107
+ autodoc.ClassDocumenter = MockedClassDocumenter
108
+
109
+ navigation_with_keys = False
data/xtuner/docs/en/dpo/modify_settings.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Modify DPO Training Configuration
2
+
3
+ This section introduces config parameters related to DPO (Direct Preference Optimization) training. For more details on XTuner config files, please refer to [Modifying Training Configuration](https://xtuner.readthedocs.io/zh-cn/latest/training/modify_settings.html).
4
+
5
+ ### Loss Function
6
+
7
+ In DPO training, you can choose different types of loss functions according to your needs. XTuner provides various loss function options, such as `sigmoid`, `hinge`, `ipo`, etc. You can select the desired loss function type by setting the `dpo_loss_type` parameter.
8
+
9
+ Additionally, you can control the temperature coefficient in the loss function by adjusting the `loss_beta` parameter. The `label_smoothing` parameter can be used for smoothing labels.
10
+
11
+ ```python
12
+ #######################################################################
13
+ # PART 1 Settings #
14
+ #######################################################################
15
+ # Model
16
+ dpo_loss_type = 'sigmoid' # One of ['sigmoid', 'hinge', 'ipo', 'kto_pair', 'sppo_hard', 'nca_pair', 'robust']
17
+ loss_beta = 0.1
18
+ label_smoothing = 0.0
19
+ ```
20
+
21
+ ### Modifying the Model
22
+
23
+ Users can modify `pretrained_model_name_or_path` to change the pretrained model.
24
+
25
+ ```python
26
+ #######################################################################
27
+ # PART 1 Settings #
28
+ #######################################################################
29
+ # Model
30
+ pretrained_model_name_or_path = 'internlm/internlm2-chat-1_8b-sft'
31
+ ```
32
+
33
+ ### Training Data
34
+
35
+ In DPO training, you can specify the maximum number of tokens for a single sample sequence using the `max_length` parameter. XTuner will automatically truncate or pad the data.
36
+
37
+ ```python
38
+ # Data
39
+ max_length = 2048
40
+ ```
41
+
42
+ In the configuration file, we use the `train_dataset` field to specify the training dataset. You can specify the dataset loading method using the `dataset` field and the dataset mapping function using the `dataset_map_fn` field.
43
+
44
+ ```python
45
+ #######################################################################
46
+ # PART 3 Dataset & Dataloader #
47
+ #######################################################################
48
+ sampler = SequenceParallelSampler \
49
+ if sequence_parallel_size > 1 else DefaultSampler
50
+
51
+ train_dataset = dict(
52
+ type=build_preference_dataset,
53
+ dataset=dict(type=load_dataset, path='mlabonne/orpo-dpo-mix-40k'),
54
+ tokenizer=tokenizer,
55
+ max_length=max_length,
56
+ dataset_map_fn=orpo_dpo_mix_40k_map_fn,
57
+ is_dpo=True,
58
+ is_reward=False,
59
+ reward_token_id=-1,
60
+ num_proc=32,
61
+ use_varlen_attn=use_varlen_attn,
62
+ max_packed_length=max_packed_length,
63
+ shuffle_before_pack=True,
64
+ )
65
+
66
+ train_dataloader = dict(
67
+ batch_size=batch_size,
68
+ num_workers=dataloader_num_workers,
69
+ dataset=train_dataset,
70
+ sampler=dict(type=sampler, shuffle=True),
71
+ collate_fn=dict(
72
+ type=preference_collate_fn, use_varlen_attn=use_varlen_attn))
73
+ ```
74
+
75
+ In the above configuration, we use `load_dataset` to load the `mlabonne/orpo-dpo-mix-40k` dataset from Hugging Face and use `orpo_dpo_mix_40k_map_fn` as the dataset mapping function.
76
+
77
+ For more information on handling datasets and writing dataset mapping functions, please refer to the [Preference Dataset Section](../reward_model/preference_data.md).
78
+
79
+ ### Accelerating Training
80
+
81
+ When training with preference data, we recommend enabling the [Variable-Length Attention Mechanism](https://xtuner.readthedocs.io/zh-cn/latest/acceleration/varlen_flash_attn.html) to avoid memory waste caused by length differences between chosen and rejected samples within a single preference. You can enable the variable-length attention mechanism by setting `use_varlen_attn=True`.
82
+
83
+ XTuner also supports many training acceleration methods. For details on how to use them, please refer to the [Acceleration Strategies Section](https://xtuner.readthedocs.io/zh-cn/latest/acceleration/hyper_parameters.html).
data/xtuner/docs/en/dpo/overview.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Introduction to DPO
2
+
3
+ ### Overview
4
+
5
+ DPO (Direct Preference Optimization) is a method used in large language model training for directly optimizing human preferences. Unlike traditional reinforcement learning methods, DPO directly uses human preference data to optimize the model, thereby improving the quality of generated content to better align with human preferences. DPO also eliminates the need to train a Reward Model and a Critic Model, avoiding the complexity of reinforcement learning algorithms, reducing training overhead, and enhancing training efficiency.
6
+
7
+ Many algorithms have made certain improvements to DPO's loss function. In XTuner, besides DPO, we have also implemented loss functions from papers such as [Identity Preference Optimization (IPO)](https://huggingface.co/papers/2310.12036). To use these algorithms, please refer to the [Modify DPO Settings](./modify_settings.md) section. We also provide some [example configurations](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/dpo) for reference.
8
+
9
+ In addition to DPO, there are alignment algorithms like [ORPO](https://arxiv.org/abs/2403.07691) that do not require a reference model. ORPO uses the concept of odds ratio to optimize the model by penalizing rejected samples during the training process, thereby adapting more effectively to the chosen samples. ORPO eliminates the dependence on a reference model, making the training process more simplified and efficient. The training method for ORPO in XTuner is very similar to DPO, and we provide some [example configurations](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/orpo). Users can refer to the DPO tutorial to modify the configuration.
10
+
11
+ ### Features of DPO Training in XTuner
12
+
13
+ DPO training in XTuner offers the following significant advantages:
14
+
15
+ 1. **Latest Algorithms**: In addition to supporting standard DPO, XTuner also supports improved DPO algorithms or memory efficient algorithms like ORPO that do not rely on reference models.
16
+
17
+ 2. **Reducing Memory Waste**: Due to the length differences in chosen and rejected data in preference datasets, padding tokens during data concatenation can cause memory waste. In XTuner, by utilizing the variable-length attention feature from Flash Attention2, preference pairs are packed into the same sequence during training, significantly reducing memory waste caused by padding tokens. This not only improves memory efficiency but also allows for training larger models or handling more data under the same hardware conditions.
18
+
19
+ ![img](../../zh_cn/reward_model/images/var_len_atten.png)
20
+
21
+ 3. **Efficient Training**: Leveraging XTuner's QLoRA training capabilities, the reference model can be converted into a policy model with the LoRA adapter removed, eliminating the memory overhead of the reference model weights and significantly reducing DPO training costs.
22
+
23
+ 4. **Long Text Training**: With XTuner's sequence parallel functionality, long text data can be trained efficiently.
24
+
25
+ ### Getting Started
26
+
27
+ Refer to the [Quick Start Guide](./quick_start.md) to understand the basic concepts. For more information on configuring training parameters, please see the [Modify DPO Settings](./modify_settings.md) section.
data/xtuner/docs/en/dpo/quick_start.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Quick Start with DPO
2
+
3
+ In this section, we will introduce how to use XTuner to train a 1.8B DPO (Direct Preference Optimization) model to help you get started quickly.
4
+
5
+ ### Preparing Pretrained Model Weights
6
+
7
+ We use the model [InternLM2-chat-1.8b-sft](https://huggingface.co/internlm/internlm2-chat-1_8b-sft), as the initial model for DPO training to align human preferences.
8
+
9
+ Set `pretrained_model_name_or_path = 'internlm/internlm2-chat-1_8b-sft'` in the training configuration file, and the model files will be automatically downloaded when training starts. If you need to download the model weights manually, please refer to the section [Preparing Pretrained Model Weights](https://xtuner.readthedocs.io/zh-cn/latest/preparation/pretrained_model.html), which provides detailed instructions on how to download model weights from Huggingface or Modelscope. Here are the links to the models on HuggingFace and ModelScope:
10
+
11
+ - HuggingFace link: https://huggingface.co/internlm/internlm2-chat-1_8b-sft
12
+ - ModelScope link: https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft/summary
13
+
14
+ ### Preparing Training Data
15
+
16
+ In this tutorial, we use the [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) dataset from Huggingface as an example.
17
+
18
+ ```python
19
+ train_dataset = dict(
20
+ type=build_preference_dataset,
21
+ dataset=dict(
22
+ type=load_dataset,
23
+ path='mlabonne/orpo-dpo-mix-40k'),
24
+ dataset_map_fn=orpo_dpo_mix_40k_map_fn,
25
+ is_dpo=True,
26
+ is_reward=False,
27
+ )
28
+ ```
29
+
30
+ Using the above configuration in the configuration file will automatically download and process this dataset. If you want to use other open-source datasets from Huggingface or custom datasets, please refer to the [Preference Dataset](../reward_model/preference_data.md) section.
31
+
32
+ ### Preparing Configuration File
33
+
34
+ XTuner provides several ready-to-use configuration files, which can be viewed using `xtuner list-cfg`. Execute the following command to copy a configuration file to the current directory.
35
+
36
+ ```bash
37
+ xtuner copy-cfg internlm2_chat_1_8b_dpo_full .
38
+ ```
39
+
40
+ Open the copied configuration file. If you choose to download the model and dataset automatically, no modifications are needed. If you want to specify paths to your pre-downloaded model and dataset, modify the `pretrained_model_name_or_path` and the `path` parameter in `dataset` under `train_dataset`.
41
+
42
+ For more training parameter configurations, please refer to the section [Modifying DPO Training Configuration](./modify_settings.md) section.
43
+
44
+ ### Starting the Training
45
+
46
+ After completing the above steps, you can start the training task using the following commands.
47
+
48
+ ```bash
49
+ # Single machine, single GPU
50
+ xtuner train ./internlm2_chat_1_8b_dpo_full_copy.py
51
+ # Single machine, multiple GPUs
52
+ NPROC_PER_NODE=${GPU_NUM} xtuner train ./internlm2_chat_1_8b_dpo_full_copy.py
53
+ # Slurm cluster
54
+ srun ${SRUN_ARGS} xtuner train ./internlm2_chat_1_8b_dpo_full_copy.py --launcher slurm
55
+ ```
56
+
57
+ ### Model Conversion
58
+
59
+ XTuner provides integrated tools to convert models to HuggingFace format. Simply execute the following commands:
60
+
61
+ ```bash
62
+ # Create a directory for HuggingFace format parameters
63
+ mkdir work_dirs/internlm2_chat_1_8b_dpo_full_copy/iter_15230_hf
64
+
65
+ # Convert format
66
+ xtuner convert pth_to_hf internlm2_chat_1_8b_dpo_full_copy.py \
67
+ work_dirs/internlm2_chat_1_8b_dpo_full_copy/iter_15230.pth \
68
+ work_dirs/internlm2_chat_1_8b_dpo_full_copy/iter_15230_hf
69
+ ```
70
+
71
+ This will convert the XTuner's ckpt to the HuggingFace format.
data/xtuner/docs/en/evaluation/hook.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Evaluation during training
data/xtuner/docs/en/evaluation/mmbench.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # MMBench (VLM)
data/xtuner/docs/en/evaluation/mmlu.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # MMLU (LLM)
data/xtuner/docs/en/evaluation/opencompass.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Evaluate with OpenCompass
data/xtuner/docs/en/get_started/installation.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Installation
2
+
3
+ In this section, we will show you how to install XTuner.
4
+
5
+ ## Installation Process
6
+
7
+ We recommend users to follow our best practices for installing XTuner.
8
+ It is recommended to use a conda virtual environment with Python-3.10 to install XTuner.
9
+
10
+ ### Best Practices
11
+
12
+ **Step 0.** Create a Python-3.10 virtual environment using conda.
13
+
14
+ ```shell
15
+ conda create --name xtuner-env python=3.10 -y
16
+ conda activate xtuner-env
17
+ ```
18
+
19
+ **Step 1.** Install XTuner.
20
+
21
+ Case a: Install XTuner via pip:
22
+
23
+ ```shell
24
+ pip install -U xtuner
25
+ ```
26
+
27
+ Case b: Install XTuner with DeepSpeed integration:
28
+
29
+ ```shell
30
+ pip install -U 'xtuner[deepspeed]'
31
+ ```
32
+
33
+ Case c: Install XTuner from the source code:
34
+
35
+ ```shell
36
+ git clone https://github.com/InternLM/xtuner.git
37
+ cd xtuner
38
+ pip install -e '.[all]'
39
+ # "-e" indicates installing the project in editable mode, so any local modifications to the code will take effect without reinstalling.
40
+ ```
41
+
42
+ ## Verify the installation
43
+
44
+ To verify if XTuner is installed correctly, we will use a command to print the configuration files.
45
+
46
+ **Print Configuration Files:** Use the command `xtuner list-cfg` in the command line to verify if the configuration files can be printed.
47
+
48
+ ```shell
49
+ xtuner list-cfg
50
+ ```
51
+
52
+ You should see a list of XTuner configuration files, corresponding to the ones in [xtuner/configs](https://github.com/InternLM/xtuner/tree/main/xtuner/configs) in the source code.
data/xtuner/docs/en/get_started/overview.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Overview
2
+
3
+ This chapter introduces you to the framework and workflow of XTuner, and provides detailed tutorial links.
4
+
5
+ ## What is XTuner
data/xtuner/docs/en/get_started/quickstart.md ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quickstart
2
+
3
+ In this section, we will show you how to use XTuner to fine-tune a model to help you get started quickly.
4
+
5
+ After installing XTuner successfully, we can start fine-tuning the model. In this section, we will demonstrate how to use XTuner to apply the QLoRA algorithm to fine-tune InternLM2-Chat-7B on the Colorist dataset.
6
+
7
+ The Colorist dataset ([HuggingFace link](https://huggingface.co/datasets/burkelibbey/colors); [ModelScope link](https://www.modelscope.cn/datasets/fanqiNO1/colors/summary)) is a dataset that provides color choices and suggestions based on color descriptions. A model fine-tuned on this dataset can be used to give a hexadecimal color code based on the user's description of the color. For example, when the user enters "a calming but fairly bright light sky blue, between sky blue and baby blue, with a hint of fluorescence due to its brightness", the model will output ![#66ccff](https://img.shields.io/badge/%2366ccff-66CCFF), which matches the user's description. There are a few sample data from this dataset:
8
+
9
+ | Enligsh Description | Chinese Description | Color |
10
+ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
11
+ | Light Sky Blue: A calming, fairly bright color that falls between sky blue and baby blue, with a hint of slight fluorescence due to its brightness. | 浅天蓝色:一种介于天蓝和婴儿蓝之间的平和、相当明亮的颜色,由于明亮而带有一丝轻微的荧光。 | #66ccff: ![#66ccff](https://img.shields.io/badge/%2366ccff-66CCFF) |
12
+ | Bright red: This is a very vibrant, saturated and vivid shade of red, resembling the color of ripe apples or fresh blood. It is as red as you can get on a standard RGB color palette, with no elements of either blue or green. | 鲜红色: 这是一种非常鲜艳、饱和、生动的红色,类似成熟苹果或新鲜血液的颜色。它是标准 RGB 调色板上的红色,不含任何蓝色或绿色元素。 | #ee0000: ![#ee0000](https://img.shields.io/badge/%23ee0000-EE0000) |
13
+ | Bright Turquoise: This color mixes the freshness of bright green with the tranquility of light blue, leading to a vibrant shade of turquoise. It is reminiscent of tropical waters. | 明亮的绿松石色:这种颜色融合了鲜绿色的清新和淡蓝色的宁静,呈现出一种充满活力的绿松石色调。它让人联想到热带水域。 | #00ffcc: ![#00ffcc](https://img.shields.io/badge/%2300ffcc-00FFCC) |
14
+
15
+ ## Prepare the model weights
16
+
17
+ Before fine-tuning the model, we first need to prepare the weights of the model.
18
+
19
+ ### Download from HuggingFace
20
+
21
+ ```bash
22
+ pip install -U huggingface_hub
23
+
24
+ # Download the model weights to Shanghai_AI_Laboratory/internlm2-chat-7b
25
+ huggingface-cli download internlm/internlm2-chat-7b \
26
+ --local-dir Shanghai_AI_Laboratory/internlm2-chat-7b \
27
+ --local-dir-use-symlinks False \
28
+ --resume-download
29
+ ```
30
+
31
+ ### Download from ModelScope
32
+
33
+ Since pulling model weights from HuggingFace may lead to an unstable download process, slow download speed and other problems, we can choose to download the weights of InternLM2-Chat-7B from ModelScope when experiencing network issues.
34
+
35
+ ```bash
36
+ pip install -U modelscope
37
+
38
+ # Download the model weights to the current directory
39
+ python -c "from modelscope import snapshot_download; snapshot_download('Shanghai_AI_Laboratory/internlm2-chat-7b', cache_dir='.')"
40
+ ```
41
+
42
+ After completing the download, we can start to prepare the dataset for fine-tuning.
43
+
44
+ The HuggingFace link and ModelScope link are attached here:
45
+
46
+ - The HuggingFace link is located at: https://huggingface.co/internlm/internlm2-chat-7b
47
+ - The ModelScope link is located at: https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b/summary
48
+
49
+ ## Prepare the fine-tuning dataset
50
+
51
+ ### Download from HuggingFace
52
+
53
+ ```bash
54
+ git clone https://huggingface.co/datasets/burkelibbey/colors
55
+ ```
56
+
57
+ ### Download from ModelScope
58
+
59
+ Due to the same reason, we can choose to download the dataset from ModelScope.
60
+
61
+ ```bash
62
+ git clone https://www.modelscope.cn/datasets/fanqiNO1/colors.git
63
+ ```
64
+
65
+ The HuggingFace link and ModelScope link are attached here:
66
+
67
+ - The HuggingFace link is located at: https://huggingface.co/datasets/burkelibbey/colors
68
+ - The ModelScope link is located at: https://modelscope.cn/datasets/fanqiNO1/colors
69
+
70
+ ## Prepare the config
71
+
72
+ XTuner provides several configs out-of-the-box, which can be viewed via `xtuner list-cfg`. We can use the following command to copy a config to the current directory.
73
+
74
+ ```bash
75
+ xtuner copy-cfg internlm2_7b_qlora_colorist_e5 .
76
+ ```
77
+
78
+ Explanation of the config name:
79
+
80
+ | Config Name | internlm2_7b_qlora_colorist_e5 |
81
+ | ----------- | ------------------------------ |
82
+ | Model Name | internlm2_7b |
83
+ | Algorithm | qlora |
84
+ | Dataset | colorist |
85
+ | Epochs | 5 |
86
+
87
+ The directory structure at this point should look like this:
88
+
89
+ ```bash
90
+ .
91
+ ├── colors
92
+ │ ├── colors.json
93
+ │ ├── dataset_infos.json
94
+ │ ├── README.md
95
+ │ └── train.jsonl
96
+ ├── internlm2_7b_qlora_colorist_e5_copy.py
97
+ └── Shanghai_AI_Laboratory
98
+ └── internlm2-chat-7b
99
+ ├── config.json
100
+ ├── configuration_internlm2.py
101
+ ├── configuration.json
102
+ ├── generation_config.json
103
+ ├── modeling_internlm2.py
104
+ ├── pytorch_model-00001-of-00008.bin
105
+ ├── pytorch_model-00002-of-00008.bin
106
+ ├── pytorch_model-00003-of-00008.bin
107
+ ├── pytorch_model-00004-of-00008.bin
108
+ ├── pytorch_model-00005-of-00008.bin
109
+ ├── pytorch_model-00006-of-00008.bin
110
+ ├── pytorch_model-00007-of-00008.bin
111
+ ├── pytorch_model-00008-of-00008.bin
112
+ ├── pytorch_model.bin.index.json
113
+ ├── README.md
114
+ ├── special_tokens_map.json
115
+ ├── tokenization_internlm2_fast.py
116
+ ├── tokenization_internlm2.py
117
+ ├── tokenizer_config.json
118
+ └── tokenizer.model
119
+ ```
120
+
121
+ ## Modify the config
122
+
123
+ In this step, we need to modify the model path and dataset path to local paths and modify the dataset loading method.
124
+ In addition, since the copied config is based on the Base model, we also need to modify the `prompt_template` to adapt to the Chat model.
125
+
126
+ ```diff
127
+ #######################################################################
128
+ # PART 1 Settings #
129
+ #######################################################################
130
+ # Model
131
+ - pretrained_model_name_or_path = 'internlm/internlm2-7b'
132
+ + pretrained_model_name_or_path = './Shanghai_AI_Laboratory/internlm2-chat-7b'
133
+
134
+ # Data
135
+ - data_path = 'burkelibbey/colors'
136
+ + data_path = './colors/train.jsonl'
137
+ - prompt_template = PROMPT_TEMPLATE.default
138
+ + prompt_template = PROMPT_TEMPLATE.internlm2_chat
139
+
140
+ ...
141
+ #######################################################################
142
+ # PART 3 Dataset & Dataloader #
143
+ #######################################################################
144
+ train_dataset = dict(
145
+ type=process_hf_dataset,
146
+ - dataset=dict(type=load_dataset, path=data_path),
147
+ + dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)),
148
+ tokenizer=tokenizer,
149
+ max_length=max_length,
150
+ dataset_map_fn=colors_map_fn,
151
+ template_map_fn=dict(
152
+ type=template_map_fn_factory, template=prompt_template),
153
+ remove_unused_columns=True,
154
+ shuffle_before_pack=True,
155
+ pack_to_max_length=pack_to_max_length)
156
+ ```
157
+
158
+ Therefore, `pretrained_model_name_or_path`, `data_path`, `prompt_template`, and the `dataset` fields in `train_dataset` are modified.
159
+
160
+ ## Start fine-tuning
161
+
162
+ Once having done the above steps, we can start fine-tuning using the following command.
163
+
164
+ ```bash
165
+ # Single GPU
166
+ xtuner train ./internlm2_7b_qlora_colorist_e5_copy.py
167
+ # Multiple GPUs
168
+ NPROC_PER_NODE=${GPU_NUM} xtuner train ./internlm2_7b_qlora_colorist_e5_copy.py
169
+ # Slurm
170
+ srun ${SRUN_ARGS} xtuner train ./internlm2_7b_qlora_colorist_e5_copy.py --launcher slurm
171
+ ```
172
+
173
+ The correct training log may look similar to the one shown below:
174
+
175
+ ```text
176
+ 01/29 21:35:34 - mmengine - INFO - Iter(train) [ 10/720] lr: 9.0001e-05 eta: 0:31:46 time: 2.6851 data_time: 0.0077 memory: 12762 loss: 2.6900
177
+ 01/29 21:36:02 - mmengine - INFO - Iter(train) [ 20/720] lr: 1.9000e-04 eta: 0:32:01 time: 2.8037 data_time: 0.0071 memory: 13969 loss: 2.6049 grad_norm: 0.9361
178
+ 01/29 21:36:29 - mmengine - INFO - Iter(train) [ 30/720] lr: 1.9994e-04 eta: 0:31:24 time: 2.7031 data_time: 0.0070 memory: 13969 loss: 2.5795 grad_norm: 0.9361
179
+ 01/29 21:36:57 - mmengine - INFO - Iter(train) [ 40/720] lr: 1.9969e-04 eta: 0:30:55 time: 2.7247 data_time: 0.0069 memory: 13969 loss: 2.3352 grad_norm: 0.8482
180
+ 01/29 21:37:24 - mmengine - INFO - Iter(train) [ 50/720] lr: 1.9925e-04 eta: 0:30:28 time: 2.7286 data_time: 0.0068 memory: 13969 loss: 2.2816 grad_norm: 0.8184
181
+ 01/29 21:37:51 - mmengine - INFO - Iter(train) [ 60/720] lr: 1.9863e-04 eta: 0:29:58 time: 2.7048 data_time: 0.0069 memory: 13969 loss: 2.2040 grad_norm: 0.8184
182
+ 01/29 21:38:18 - mmengine - INFO - Iter(train) [ 70/720] lr: 1.9781e-04 eta: 0:29:31 time: 2.7302 data_time: 0.0068 memory: 13969 loss: 2.1912 grad_norm: 0.8460
183
+ 01/29 21:38:46 - mmengine - INFO - Iter(train) [ 80/720] lr: 1.9681e-04 eta: 0:29:05 time: 2.7338 data_time: 0.0069 memory: 13969 loss: 2.1512 grad_norm: 0.8686
184
+ 01/29 21:39:13 - mmengine - INFO - Iter(train) [ 90/720] lr: 1.9563e-04 eta: 0:28:36 time: 2.7047 data_time: 0.0068 memory: 13969 loss: 2.0653 grad_norm: 0.8686
185
+ 01/29 21:39:40 - mmengine - INFO - Iter(train) [100/720] lr: 1.9426e-04 eta: 0:28:09 time: 2.7383 data_time: 0.0070 memory: 13969 loss: 1.9819 grad_norm: 0.9127
186
+ ```
187
+
188
+ Before training begins, the output of the model is as shown below:
189
+
190
+ ```text
191
+ 2024/01/29 21:34:58 - mmengine - INFO - before_train in EvaluateChatHook.
192
+ 2024/01/29 21:35:03 - mmengine - INFO - Sample output:
193
+ <s><|im_start|>system
194
+ You are a professional color designer. Please provide the corresponding colors based on the description of Human.
195
+ <|im_end|>
196
+ <|im_start|>user
197
+ 请给我一个像天空一样清澈透明的蓝色。<|im_end|>
198
+ <|im_start|>assistant
199
+ 为了匹配您所描述的“像天空一样清澈透明的蓝色”,我建议您选择一种名为“天蓝”(Cerulean)的颜色。这种颜色通常被用来代表天空、海洋和清澈的水域,它具有明亮
200
+
201
+ 2024/01/29 21:35:07 - mmengine - INFO - Sample output:
202
+ <s><|im_start|>system
203
+ You are a professional color designer. Please provide the corresponding colors based on the description of Human.
204
+ <|im_end|>
205
+ <|im_start|>user
206
+ Please give me a clear blue like the sky.<|im_end|>
207
+ <|im_start|>assistant
208
+ The color you're describing is a clear, bright blue that resembles the color of the sky on a sunny day. This color is often referred to as "sky blue" or "cerulean blue." Here are some specific color codes and names that
209
+ ```
210
+
211
+ After training, the output of the model is as shown below:
212
+
213
+ ```text
214
+ 2024/01/29 22:08:07 - mmengine - INFO - Sample output:
215
+ <s><|im_start|>system
216
+ You are a professional color designer. Please provide the corresponding colors based on the description of Human.
217
+ <|im_end|>
218
+ <|im_start|>user
219
+ 请给我一个像天空一样清澈透明的蓝色。<|im_end|>
220
+ <|im_start|>assistant
221
+ #1099ee<|im_end|>
222
+
223
+ 2024/01/29 22:08:08 - mmengine - INFO - Sample output:
224
+ <s><|im_start|>system
225
+ You are a professional color designer. Please provide the corresponding colors based on the description of Human.
226
+ <|im_end|>
227
+ <|im_start|>user
228
+ Please give me a clear blue like the sky.<|im_end|>
229
+ <|im_start|>assistant
230
+ #0066dd<|im_end|>
231
+ ```
232
+
233
+ The color of the model output is shown below:
234
+
235
+ - 天空一样清澈透明的蓝色:![天空一样清澈透明的蓝色](https://img.shields.io/badge/天空一样清澈透明的蓝色-1099EE)
236
+ - A clear blue like the sky: ![A clear blue like the sky](https://img.shields.io/badge/A_clear_blue_like_the_sky-0066DD)
237
+
238
+ It is clear that the output of the model after training has been fully aligned with the content of the dataset.
239
+
240
+ # Model Convert + LoRA Merge
241
+
242
+ After training, we will get several `.pth` files that do **NOT** contain all the parameters of the model, but store the parameters updated by the training process of the QLoRA algorithm. Therefore, we need to convert these `.pth` files to HuggingFace format and merge them into the original LLM weights.
243
+
244
+ ### Model Convert
245
+
246
+ XTuner has already integrated the tool of converting the model to HuggingFace format. We can use the following command to convert the model.
247
+
248
+ ```bash
249
+ # Create the directory to store parameters in hf format
250
+ mkdir work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720_hf
251
+
252
+ # Convert the model to hf format
253
+ xtuner convert pth_to_hf internlm2_7b_qlora_colorist_e5_copy.py \
254
+ work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720.pth \
255
+ work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720_hf
256
+ ```
257
+
258
+ This command will convert `work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720.pth` to hf format based on the contents of the config `internlm2_7b_qlora_colorist_e5_copy.py` and will save it in `work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720_hf`.
259
+
260
+ ### LoRA Merge
261
+
262
+ XTuner has also integrated the tool of merging LoRA weights, we just need to execute the following command:
263
+
264
+ ```bash
265
+ # Create the directory to store the merged weights
266
+ mkdir work_dirs/internlm2_7b_qlora_colorist_e5_copy/merged
267
+
268
+ # Merge the weights
269
+ xtuner convert merge Shanghai_AI_Laboratory/internlm2-chat-7b \
270
+ work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720_hf \
271
+ work_dirs/internlm2_7b_qlora_colorist_e5_copy/merged \
272
+ --max-shard-size 2GB
273
+ ```
274
+
275
+ Similar to the command above, this command will read the original parameter path `Shanghai_AI_Laboratory/internlm2-chat-7b` and the path of parameter which has been converted to hf format `work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720_hf` and merge the two parts of the parameters and save them in `work_dirs/internlm2_7b_qlora_colorist_e5_copy/merged`, where the maximum file size for each parameter slice is 2GB.
276
+
277
+ ## Chat with the model
278
+
279
+ To better appreciate the model's capabilities after merging the weights, we can chat with the model. XTuner also integrates the tool of chatting with models. We can start a simple demo to chat with the model with the following command:
280
+
281
+ ```bash
282
+ xtuner chat work_dirs/internlm2_7b_qlora_colorist_e5_copy/merged \
283
+ --prompt-template internlm2_chat \
284
+ --system-template colorist
285
+ ```
286
+
287
+ Of course, we can also choose not to merge the weights and instead chat directly with the LLM + LoRA Adapter, we just need to execute the following command:
288
+
289
+ ```bash
290
+ xtuner chat Shanghai_AI_Laboratory/internlm2-chat-7b
291
+ --adapter work_dirs/internlm2_7b_qlora_colorist_e5_copy/iter_720_hf \
292
+ --prompt-template internlm2_chat \
293
+ --system-template colorist
294
+ ```
295
+
296
+ where `work_dirs/internlm2_7b_qlora_colorist_e5_copy/merged` is the path to the merged weights, `--prompt-template internlm2_chat` specifies that the chat template is InternLM2-Chat, and `-- system-template colorist` specifies that the System Prompt for conversations with models is the template required by the Colorist dataset.
297
+
298
+ There is an example below:
299
+
300
+ ```text
301
+ double enter to end input (EXIT: exit chat, RESET: reset history) >>> A calming but fairly bright light sky blue, between sky blue and baby blue, with a hint of fluorescence due to its brightness.
302
+
303
+ #66ccff<|im_end|>
304
+ ```
305
+
306
+ The color of the model output is shown below:
307
+
308
+ A calming but fairly bright light sky blue, between sky blue and baby blue, with a hint of fluorescence due to its brightness: ![#66ccff](https://img.shields.io/badge/A_calming_but_fairly_bright_light_sky_blue_between_sky_blue_and_baby_blue_with_a_hint_of_fluorescence_due_to_its_brightness-66CCFF).
data/xtuner/docs/en/index.rst ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .. xtuner documentation master file, created by
2
+ sphinx-quickstart on Tue Jan 9 16:33:06 2024.
3
+ You can adapt this file completely to your liking, but it should at least
4
+ contain the root `toctree` directive.
5
+
6
+ Welcome to XTuner's documentation!
7
+ ==================================
8
+
9
+ .. figure:: ./_static/image/logo.png
10
+ :align: center
11
+ :alt: xtuner
12
+ :class: no-scaled-link
13
+
14
+ .. raw:: html
15
+
16
+ <p style="text-align:center">
17
+ <strong>All-IN-ONE toolbox for LLM
18
+ </strong>
19
+ </p>
20
+
21
+ <p style="text-align:center">
22
+ <script async defer src="https://buttons.github.io/buttons.js"></script>
23
+ <a class="github-button" href="https://github.com/InternLM/xtuner" data-show-count="true" data-size="large" aria-label="Star">Star</a>
24
+ <a class="github-button" href="https://github.com/InternLM/xtuner/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
25
+ <a class="github-button" href="https://github.com/InternLM/xtuner/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
26
+ </p>
27
+
28
+
29
+
30
+ Documentation
31
+ -------------
32
+ .. toctree::
33
+ :maxdepth: 2
34
+ :caption: Get Started
35
+
36
+ get_started/overview.md
37
+ get_started/installation.md
38
+ get_started/quickstart.md
39
+
40
+ .. toctree::
41
+ :maxdepth: 2
42
+ :caption: Preparation
43
+
44
+ preparation/pretrained_model.rst
45
+ preparation/prompt_template.rst
46
+
47
+ .. toctree::
48
+ :maxdepth: 2
49
+ :caption: Training
50
+
51
+ training/modify_settings.rst
52
+ training/custom_sft_dataset.rst
53
+ training/custom_pretrain_dataset.rst
54
+ training/custom_agent_dataset.rst
55
+ training/multi_modal_dataset.rst
56
+ training/open_source_dataset.rst
57
+ training/visualization.rst
58
+
59
+ .. toctree::
60
+ :maxdepth: 2
61
+ :caption: DPO
62
+
63
+ dpo/overview.md
64
+ dpo/quick_start.md
65
+ dpo/modify_settings.md
66
+
67
+ .. toctree::
68
+ :maxdepth: 2
69
+ :caption: Reward Model
70
+
71
+ reward_model/overview.md
72
+ reward_model/quick_start.md
73
+ reward_model/modify_settings.md
74
+ reward_model/preference_data.md
75
+
76
+ .. toctree::
77
+ :maxdepth: 2
78
+ :caption: Acceleration
79
+
80
+ acceleration/deepspeed.rst
81
+ acceleration/pack_to_max_length.rst
82
+ acceleration/flash_attn.rst
83
+ acceleration/varlen_flash_attn.rst
84
+ acceleration/hyper_parameters.rst
85
+ acceleration/length_grouped_sampler.rst
86
+ acceleration/train_large_scale_dataset.rst
87
+ acceleration/train_extreme_long_sequence.rst
88
+ acceleration/benchmark.rst
89
+
90
+ .. toctree::
91
+ :maxdepth: 2
92
+ :caption: Chat
93
+
94
+ chat/llm.md
95
+ chat/agent.md
96
+ chat/vlm.md
97
+ chat/lmdeploy.md
98
+
99
+ .. toctree::
100
+ :maxdepth: 2
101
+ :caption: Evaluation
102
+
103
+ evaluation/hook.md
104
+ evaluation/mmlu.md
105
+ evaluation/mmbench.md
106
+ evaluation/opencompass.md
107
+
108
+ .. toctree::
109
+ :maxdepth: 2
110
+ :caption: Models
111
+
112
+ models/supported.md
113
+
114
+ .. toctree::
115
+ :maxdepth: 2
116
+ :caption: InternEvo Migration
117
+
118
+ internevo_migration/internevo_migration.rst
119
+ internevo_migration/ftdp_dataset/ftdp.rst
120
+ internevo_migration/ftdp_dataset/Case1.rst
121
+ internevo_migration/ftdp_dataset/Case2.rst
122
+ internevo_migration/ftdp_dataset/Case3.rst
123
+ internevo_migration/ftdp_dataset/Case4.rst
data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case1.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Case 1
2
+ ======
data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case2.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Case 2
2
+ ======
data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case3.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Case 3
2
+ ======
data/xtuner/docs/en/internevo_migration/ftdp_dataset/Case4.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Case 4
2
+ ======
data/xtuner/docs/en/internevo_migration/ftdp_dataset/ftdp.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ ftdp
2
+ ====
data/xtuner/docs/en/internevo_migration/internevo_migration.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ InternEVO Migration
2
+ ===================
data/xtuner/docs/en/make.bat ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @ECHO OFF
2
+
3
+ pushd %~dp0
4
+
5
+ REM Command file for Sphinx documentation
6
+
7
+ if "%SPHINXBUILD%" == "" (
8
+ set SPHINXBUILD=sphinx-build
9
+ )
10
+ set SOURCEDIR=.
11
+ set BUILDDIR=_build
12
+
13
+ %SPHINXBUILD% >NUL 2>NUL
14
+ if errorlevel 9009 (
15
+ echo.
16
+ echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17
+ echo.installed, then set the SPHINXBUILD environment variable to point
18
+ echo.to the full path of the 'sphinx-build' executable. Alternatively you
19
+ echo.may add the Sphinx directory to PATH.
20
+ echo.
21
+ echo.If you don't have Sphinx installed, grab it from
22
+ echo.https://www.sphinx-doc.org/
23
+ exit /b 1
24
+ )
25
+
26
+ if "%1" == "" goto help
27
+
28
+ %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29
+ goto end
30
+
31
+ :help
32
+ %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33
+
34
+ :end
35
+ popd
data/xtuner/docs/en/models/supported.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Supported Models
data/xtuner/docs/en/notes/changelog.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--
2
+
3
+ ## vX.X.X (YYYY.MM.DD)
4
+
5
+ ### Highlights
6
+
7
+ ### New Features & Improvements
8
+
9
+ ### Bug Fixes
10
+
11
+ ### Contributors
12
+
13
+ -->
14
+
15
+ # Changelog
16
+
17
+ ## v0.1.0 (2023.08.30)
18
+
19
+ XTuner is released! 🔥🔥🔥
20
+
21
+ ### Highlights
22
+
23
+ - XTuner supports LLM fine-tuning on consumer-grade GPUs. The minimum GPU memory required for 7B LLM fine-tuning is only **8GB**.
24
+ - XTuner supports various LLMs, datasets, algorithms and training pipelines.
25
+ - Several fine-tuned adapters are released simultaneously, including various gameplays such as the colorist LLM, plugins-based LLM, and many more. For further details, please visit [XTuner on HuggingFace](https://huggingface.co/xtuner)!
data/xtuner/docs/en/preparation/pretrained_model.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Pretrained Model
2
+ ================
data/xtuner/docs/en/preparation/prompt_template.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Prompt Template
2
+ ===============
data/xtuner/docs/en/reward_model/modify_settings.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Modify Reward Model Training Configuration
2
+
3
+ This section introduces the config related to Reward Model training. For more details on XTuner config files, please refer to [Modify Settings](https://xtuner.readthedocs.io/zh-cn/latest/training/modify_settings.html).
4
+
5
+ ### Loss Function
6
+
7
+ XTuner uses the [Bradley–Terry Model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model) for preference modeling in the Reward Model. You can specify `loss_type="ranking"` to use ranking loss. XTuner also implements the focal loss function proposed in InternLM2, which adjusts the weights of difficult and easy samples to avoid overfitting. You can set `loss_type="focal"` to use this loss function. For a detailed explanation of this loss function, please refer to the [InternLM2 Technical Report](https://arxiv.org/abs/2403.17297).
8
+
9
+ Additionally, to maintain stable reward model output scores, we have added a constraint term in the loss. You can specify `penalty_type='log_barrier'` or `penalty_type='L2'` to enable log barrier or L2 constraints, respectively.
10
+
11
+ ```python
12
+ #######################################################################
13
+ # PART 1 Settings #
14
+ #######################################################################
15
+ # Model
16
+ loss_type = 'focal' # 'ranking' or 'focal'
17
+ penalty_type = 'log_barrier' # 'log_barrier' or 'L2'
18
+ ```
19
+
20
+ ### Modifying the Model
21
+
22
+ Users can modify `pretrained_model_name_or_path` to change the pretrained model.
23
+
24
+ Note that XTuner calculates reward scores by appending a special token at the end of the data. Therefore, when switching models with different vocabularies, the ID of this special token also needs to be modified accordingly. We usually use an unused token at the end of the vocabulary as the reward token.
25
+
26
+ For example, in InternLM2, we use `[UNUSED_TOKEN_130]` as the reward token:
27
+
28
+ ```python
29
+ #######################################################################
30
+ # PART 1 Settings #
31
+ #######################################################################
32
+ # Model
33
+ pretrained_model_name_or_path = 'internlm/internlm2-chat-1_8b-sft'
34
+ reward_token_id = 92527 # use [UNUSED_TOKEN_130] as reward token
35
+ ```
36
+
37
+ If the user switches to the llama3 model, we can use `<|reserved_special_token_0|>` as the reward token:
38
+
39
+ ```python
40
+ #######################################################################
41
+ # PART 1 Settings #
42
+ #######################################################################
43
+ # Model
44
+ pretrained_model_name_or_path = 'meta-llama/Meta-Llama-3-8B-Instruct'
45
+ reward_token_id = 128002 # use <|reserved_special_token_0|> as reward token
46
+ ```
47
+
48
+ ### Training Data
49
+
50
+ In Reward Model training, you can specify the maximum number of tokens for a single sample sequence using `max_length`. XTuner will automatically truncate or pad the data.
51
+
52
+ ```python
53
+ # Data
54
+ max_length = 2048
55
+ ```
56
+
57
+ In the configuration file, we use the `train_dataset` field to specify the training dataset. You can specify the dataset loading method using the `dataset` field and the dataset mapping function using the `dataset_map_fn` field.
58
+
59
+ ```python
60
+ #######################################################################
61
+ # PART 3 Dataset & Dataloader #
62
+ #######################################################################
63
+ sampler = SequenceParallelSampler \
64
+ if sequence_parallel_size > 1 else DefaultSampler
65
+
66
+ train_dataset = dict(
67
+ type=build_preference_dataset,
68
+ dataset=dict(
69
+ type=load_dataset,
70
+ path='argilla/ultrafeedback-binarized-preferences-cleaned'),
71
+ tokenizer=tokenizer,
72
+ max_length=max_length,
73
+ dataset_map_fn=orpo_dpo_mix_40k_map_fn,
74
+ is_dpo=False,
75
+ is_reward=True,
76
+ reward_token_id=reward_token_id,
77
+ num_proc=32,
78
+ use_varlen_attn=use_varlen_attn,
79
+ max_packed_length=max_packed_length,
80
+ shuffle_before_pack=True,
81
+ )
82
+
83
+ train_dataloader = dict(
84
+ batch_size=batch_size,
85
+ num_workers=dataloader_num_workers,
86
+ dataset=train_dataset,
87
+ sampler=dict(type=sampler, shuffle=True),
88
+ collate_fn=dict(
89
+ type=preference_collate_fn, use_varlen_attn=use_varlen_attn))
90
+ ```
91
+
92
+ In the above configuration, we use `load_dataset` to load the `argilla/ultrafeedback-binarized-preferences-cleaned` dataset from Hugging Face, using `orpo_dpo_mix_40k_map_fn` as the dataset mapping function (this is because `orpo_dpo_mix_40k` and `ultrafeedback-binarized-preferences-cleaned` have the same format, so the same mapping function is used).
93
+
94
+ For more information on handling datasets and writing dataset mapping functions, please refer to the [Preference Data Section](./preference_data.md).
95
+
96
+ ### Accelerating Training
97
+
98
+ When training with preference data, we recommend enabling the [Variable-Length Attention Mechanism](https://xtuner.readthedocs.io/zh-cn/latest/acceleration/varlen_flash_attn.html) to avoid memory waste caused by length differences between chosen and rejected samples within a single preference. You can enable the variable-length attention mechanism by setting `use_varlen_attn=True`.
99
+
100
+ XTuner also supports many training acceleration methods. For details on how to use them, please refer to the [Acceleration Strategies Section](https://xtuner.readthedocs.io/zh-cn/latest/acceleration/hyper_parameters.html).
data/xtuner/docs/en/reward_model/overview.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Introduction to Reward Model
2
+
3
+ ### Overview
4
+
5
+ The Reward Model is a crucial component in the reinforcement learning process. Its primary task is to predict reward values based on given inputs, guiding the direction of the learning algorithm. In RLHF (Reinforcement Learning from Human Feedback), the Reward Model acts as a proxy for human preferences, helping the reinforcement learning algorithm optimize strategies more effectively.
6
+
7
+ In large language model training, the Reward Model typically refers to the Preference Model. By providing good and bad (chosen & rejected) responses to the same prompts during training, it fits human preferences and predicts a reward value during inference to guide the optimization of the Actor model in the RLHF process.
8
+
9
+ Applications of the Reward Model include but are not limited to:
10
+
11
+ - **RLHF Training**: During RLHF training such as the Proximal Policy Optimization (PPO) algorithm, the Reward Model provides reward signals, improve the quality of generated content, and align it more closely with human preferences.
12
+ - **BoN Sampling**: In the Best-of-N (BoN) sampling process, users can use the Reward Model to score multiple responses to the same prompt and select the highest-scoring generated result, thereby enhancing the model's output.
13
+ - **Data Construction**: The Reward Model can be used to evaluate and filter training data or replace manual annotation to construct DPO training data.
14
+
15
+ ### Features of Reward Model Training in XTuner
16
+
17
+ The Reward Model training in XTuner offers the following significant advantages:
18
+
19
+ 1. **Latest Training Techniques**: XTuner integrates the Reward Model training loss function from InternLM2, which stabilizes the numerical range of reward scores and reduces overfitting on simple samples (see [InternLM2 Technical Report](https://arxiv.org/abs/2403.17297) for details).
20
+
21
+ 2. **Reducing Memory Waste**: Due to the length differences in chosen and rejected data in preference datasets, padding tokens during data concatenation can cause memory waste. In XTuner, by utilizing the variable-length attention feature from Flash Attention2, preference pairs are packed into the same sequence during training, significantly reducing memory waste caused by padding tokens. This not only improves memory efficiency but also allows for training larger models or handling more data under the same hardware conditions.
22
+
23
+ ![img](../../zh_cn/reward_model/images/var_len_atten.png)
24
+
25
+ 3. **Efficient Training**: Leveraging XTuner's QLoRA training capabilities, we can perform full parameter training only on the Reward Model's Value Head, while using QLoRA fine-tuning on the language model itself, substantially reducing the memory overhead of model training.
26
+
27
+ 4. **Long Text Training**: With XTuner's sequence parallel functionality, long text data can be trained efficiently.
28
+
29
+ ![img](../../zh_cn/reward_model/images/sequence_parallel.png)
30
+
31
+ ### Getting Started
32
+
33
+ Refer to the [Quick Start Guide](./quick_start.md) to understand the basic concepts. For more information on configuring training parameters, please see the [Modifying Reward Model Settings](./modify_settings.md) section.
34
+
35
+ ### Open-source Models
36
+
37
+ We use XTuner to train the InternLM2 Reward Models from the InternLM2 Technical Report, welcome to download and use:
38
+
39
+ | Model | Transformers(HF) | ModelScope(HF) | OpenXLab(HF) | RewardBench Score |
40
+ | ------------------------- | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- |
41
+ | **InternLM2-1.8B-Reward** | [🤗internlm2-1_8b-reward](https://huggingface.co/internlm/internlm2-1_8b-reward) | [internlm2-1_8b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b-reward/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/internlm2-1_8b-reward) | 80.6 |
42
+ | **InternLM2-7B-Reward** | [🤗internlm2-7b-reward](https://huggingface.co/internlm/internlm2-7b-reward) | [internlm2-7b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-7b-reward/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/internlm2-7b-reward) | 86.6 |
43
+ | **InternLM2-20B-Reward** | [🤗internlm2-20b-reward](https://huggingface.co/internlm/internlm2-20b-reward) | [internlm2-20b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b-reward/summary) | [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/OpenLMLab/internlm2-20b-reward) | 89.5 |
data/xtuner/docs/en/reward_model/preference_data.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Preference Dataset
2
+
3
+ ### Overview
4
+
5
+ XTuner's Reward Model, along with DPO, ORPO, and other algorithms that training on preference data, adopts the same data format. Each training sample in the preference dataset needs to contain the following three fields: `prompt`, `chosen`, and `rejected`. The values for each field follow the [OpenAI chat message](https://platform.openai.com/docs/api-reference/chat/create) format. A specific example is as follows:
6
+
7
+ ```json
8
+ {
9
+ "prompt": [
10
+ {
11
+ "role": "system",
12
+ "content": "You are a helpful assistant."
13
+ },
14
+ {
15
+ "role": "user",
16
+ "content": "Who won the world series in 2020?"
17
+ },
18
+ {
19
+ "role": "assistant",
20
+ "content": "The Los Angeles Dodgers won the World Series in 2020."
21
+ },
22
+ {
23
+ "role": "user",
24
+ "content": "Where was it played?"
25
+ }
26
+ ],
27
+ "chosen": [
28
+ {
29
+ "role": "assistant",
30
+ "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas."
31
+ }
32
+ ],
33
+ "rejected": [
34
+ {
35
+ "role": "assistant",
36
+ "content": "I don't know."
37
+ }
38
+ ]
39
+ }
40
+ ```
41
+
42
+ When conducting Reward Model training or DPO training, XTuner processes the preference dataset into different training labels based on the type of training task.
43
+
44
+ ![img](../../zh_cn/reward_model/images/preference_data.png)
45
+
46
+ As shown in the above image, for Reward Model training, we follow the ChatGPT training method by adding a special `<|reward|>` token at the end of the conversation data and calculating the loss only on the logits output by this token. For DPO series algorithm training, we mask the tokens in the prompt part and calculate the loss only on the chosen and rejected responses. In the configuration file, we control the dataset type through the `is_reward` and `is_dpo` fields in the dataset.
47
+
48
+ ### Training with Custom Data
49
+
50
+ After understanding the data format of a single sample, you only need to save the converted data in [JSON Lines](https://jsonlines.org/) format. Each line in the file is a sample.
51
+
52
+ ```json
53
+ {"prompt": [{"content": "How can I learn to better regulate my emotions as a grown adult?", "role": "user"}], "chosen": [{"content": "You could do emotional regulation exercises with me, and try to notice and practice ways that you could better control your emotions in real life.", "role": "assistant"}], "rejected": [{"content": "Do you mean how can you learn to be more emotionally mature?", "role": "assistant"}]}
54
+ {"prompt": [{"content": "What are some interesting things I can add to a grilled cheese sandwich?", "role": "user"}, {"content": "A tomato, a slice of pickle, some chopped pickles, some chopped green onions. Also, some mayonnaise if you like.", "role": "assistant"}, {"content": "Sounds great. Anything else?", "role": "user"}], "chosen": [{"content": "If you’d like something a little different, I recommend grating some horseradish on top of the sandwich.", "role": "assistant"}], "rejected": [{"content": "Maybe some nice fresh basil?", "role": "assistant"}]}
55
+ {"prompt": [{"content": "How do I bob for apples?", "role": "user"}], "chosen": [{"content": "Ah! Do you mean the game of “applesauce,” or do you mean the idea of “to bob for apples”?", "role": "assistant"}], "rejected": [{"content": "Sorry, I don’t know that term.", "role": "assistant"}]}
56
+ ......
57
+ ```
58
+
59
+ After preparing the custom dataset, you need to fill in the path to your saved data in the `data_files` field in the configuration file. You can load multiple JSONL files simultaneously for training.
60
+
61
+ ```python
62
+ #######################################################################
63
+ # PART 3 Dataset & Dataloader #
64
+ #######################################################################
65
+ train_dataset = dict(
66
+ type=build_preference_dataset,
67
+ dataset=dict(
68
+ type=load_jsonl_dataset,
69
+ data_files=[
70
+ '/your/jsonl/path/here.jsonl',
71
+ '/your/another/jsonl/path/here.jsonl'
72
+ ]),
73
+ )
74
+ ```
75
+
76
+ ### Training with Open Source Datasets
77
+
78
+ Similar to configuring SFT data in XTuner, when using open-source datasets from Hugging Face, you only need to define a mapping function `map_fn` to process the dataset format into XTuner's data format.
79
+
80
+ Taking `Intel/orca_dpo_pairs` as an example, this dataset has `system`, `question`, `chosen`, and `rejected` fields, with each field's value in text format instead of the [OpenAI chat message](https://platform.openai.com/docs/api-reference/chat/create) format. Therefore, we need to define a mapping function for this dataset:
81
+
82
+ ```python
83
+ def intel_orca_dpo_map_fn(example):
84
+ prompt = [{
85
+ 'role': 'system',
86
+ 'content': example['system']
87
+ }, {
88
+ 'role': 'user',
89
+ 'content': example['question']
90
+ }]
91
+ chosen = [{'role': 'assistant', 'content': example['chosen']}]
92
+ rejected = [{'role': 'assistant', 'content': example['rejected']}]
93
+ return {'prompt': prompt, 'chosen': chosen, 'rejected': rejected}
94
+ ```
95
+
96
+ As shown in the code, `intel_orca_dpo_map_fn` processes the four fields in the original data, converting them into `prompt`, `chosen`, and `rejected` fields, and ensures each field follows the [OpenAI chat message](https://platform.openai.com/docs/api-reference/chat/create) format, maintaining uniformity in subsequent data processing flows.
97
+
98
+ After defining the mapping function, you need to import it in the configuration file and configure it in the `dataset_map_fn` field.
99
+
100
+ ```python
101
+ train_dataset = dict(
102
+ type=build_preference_dataset,
103
+ dataset=dict(
104
+ type=load_dataset,
105
+ path='Intel/orca_dpo_pairs'),
106
+ tokenizer=tokenizer,
107
+ max_length=max_length,
108
+ dataset_map_fn=intel_orca_dpo_map_fn,
109
+ )
110
+ ```
data/xtuner/docs/en/reward_model/quick_start.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Quick Start Guide for Reward Model
2
+
3
+ In this section, we will introduce how to use XTuner to train a 1.8B Reward Model, helping you get started quickly.
4
+
5
+ ### Preparing Pretrained Model Weights
6
+
7
+ According to the paper [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155), we use a language model fine-tuned with SFT as the initialization model for the Reward Model. Here, we use [InternLM2-chat-1.8b-sft](https://huggingface.co/internlm/internlm2-chat-1_8b-sft) as the initialization model.
8
+
9
+ Set `pretrained_model_name_or_path = 'internlm/internlm2-chat-1_8b-sft'` in the training configuration file, and the model files will be automatically downloaded when training starts. If you need to download the model weights manually, please refer to the section [Preparing Pretrained Model Weights](https://xtuner.readthedocs.io/zh-cn/latest/preparation/pretrained_model.html), which provides detailed instructions on how to download model weights from Huggingface or Modelscope. Here are the links to the models on HuggingFace and ModelScope:
10
+
11
+ - HuggingFace link: https://huggingface.co/internlm/internlm2-chat-1_8b-sft
12
+ - ModelScope link: https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft/summary
13
+
14
+ ### Preparing Training Data
15
+
16
+ In this tutorial, we use the [UltraFeedback](https://arxiv.org/abs/2310.01377) dataset as an example. For convenience, we use the preprocessed [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) dataset from Huggingface.
17
+
18
+ ```python
19
+ train_dataset = dict(
20
+ type=build_preference_dataset,
21
+ dataset=dict(
22
+ type=load_dataset,
23
+ path='argilla/ultrafeedback-binarized-preferences-cleaned'),
24
+ dataset_map_fn=orpo_dpo_mix_40k_map_fn,
25
+ is_dpo=False,
26
+ is_reward=True,
27
+ )
28
+ ```
29
+
30
+ Using the above configuration in the configuration file will automatically download and process this dataset. If you want to use other open-source datasets from Huggingface or custom datasets, please refer to the [Preference Dataset](./preference_data.md) section.
31
+
32
+ ### Preparing Configuration Files
33
+
34
+ XTuner provides several ready-to-use configuration files, which can be viewed using `xtuner list-cfg`. Execute the following command to copy a configuration file to the current directory.
35
+
36
+ ```bash
37
+ xtuner copy-cfg internlm2_chat_1_8b_reward_full_ultrafeedback .
38
+ ```
39
+
40
+ Open the copied configuration file. If you choose to download the model and dataset automatically, no modifications are needed. If you want to specify paths to your pre-downloaded model and dataset, modify the `pretrained_model_name_or_path` and the `path` parameter in `dataset` under `train_dataset`.
41
+
42
+ For more training parameter configurations, please refer to the section [Modifying Reward Training Configuration](./modify_settings.md).
43
+
44
+ ### Starting the Training
45
+
46
+ After completing the above steps, you can start the training task using the following commands.
47
+
48
+ ```bash
49
+ # Single node single GPU
50
+ xtuner train ./internlm2_chat_1_8b_reward_full_ultrafeedback_copy.py
51
+ # Single node multiple GPUs
52
+ NPROC_PER_NODE=${GPU_NUM} xtuner train ./internlm2_chat_1_8b_reward_full_ultrafeedback_copy.py
53
+ # Slurm cluster
54
+ srun ${SRUN_ARGS} xtuner train ./internlm2_chat_1_8b_reward_full_ultrafeedback_copy.py --launcher slurm
55
+ ```
56
+
57
+ The correct training log should look like the following (running on a single A800 GPU):
58
+
59
+ ```
60
+ 06/06 16:12:11 - mmengine - INFO - Iter(train) [ 10/15230] lr: 3.9580e-07 eta: 2:59:41 time: 0.7084 data_time: 0.0044 memory: 18021 loss: 0.6270 acc: 0.0000 chosen_score_mean: 0.0000 rejected_score_mean: 0.0000 num_samples: 4.0000 num_tokens: 969.0000
61
+ 06/06 16:12:17 - mmengine - INFO - Iter(train) [ 20/15230] lr: 8.3536e-07 eta: 2:45:25 time: 0.5968 data_time: 0.0034 memory: 42180 loss: 0.6270 acc: 0.5000 chosen_score_mean: 0.0013 rejected_score_mean: 0.0010 num_samples: 4.0000 num_tokens: 1405.0000
62
+ 06/06 16:12:22 - mmengine - INFO - Iter(train) [ 30/15230] lr: 1.2749e-06 eta: 2:37:18 time: 0.5578 data_time: 0.0024 memory: 32121 loss: 0.6270 acc: 0.7500 chosen_score_mean: 0.0016 rejected_score_mean: 0.0011 num_samples: 4.0000 num_tokens: 932.0000
63
+ 06/06 16:12:28 - mmengine - INFO - Iter(train) [ 40/15230] lr: 1.7145e-06 eta: 2:36:05 time: 0.6033 data_time: 0.0025 memory: 42186 loss: 0.6270 acc: 0.7500 chosen_score_mean: 0.0027 rejected_score_mean: 0.0016 num_samples: 4.0000 num_tokens: 994.0000
64
+ 06/06 16:12:35 - mmengine - INFO - Iter(train) [ 50/15230] lr: 2.1540e-06 eta: 2:41:03 time: 0.7166 data_time: 0.0027 memory: 42186 loss: 0.6278 acc: 0.5000 chosen_score_mean: 0.0031 rejected_score_mean: 0.0032 num_samples: 4.0000 num_tokens: 2049.0000
65
+ 06/06 16:12:40 - mmengine - INFO - Iter(train) [ 60/15230] lr: 2.5936e-06 eta: 2:33:37 time: 0.4627 data_time: 0.0023 memory: 30238 loss: 0.6262 acc: 1.0000 chosen_score_mean: 0.0057 rejected_score_mean: 0.0030 num_samples: 4.0000 num_tokens: 992.0000
66
+ 06/06 16:12:46 - mmengine - INFO - Iter(train) [ 70/15230] lr: 3.0331e-06 eta: 2:33:18 time: 0.6018 data_time: 0.0025 memory: 42186 loss: 0.6247 acc: 0.7500 chosen_score_mean: 0.0117 rejected_score_mean: 0.0055 num_samples: 4.0000 num_tokens: 815.0000
67
+ ```
68
+
69
+ ### Model Conversion
70
+
71
+ XTuner provides integrated tools to convert models to HuggingFace format. Simply execute the following commands:
72
+
73
+ ```bash
74
+ # Create a directory to store HF format parameters
75
+ mkdir work_dirs/internlm2_chat_1_8b_reward_full_ultrafeedback_copy/iter_15230_hf
76
+
77
+ # Convert the format
78
+ xtuner convert pth_to_hf internlm2_chat_1_8b_reward_full_ultrafeedback_copy.py \
79
+ work_dirs/internlm2_chat_1_8b_reward_full_ultrafeedback_copy.py/iter_15230.pth \
80
+ work_dirs/internlm2_chat_1_8b_reward_full_ultrafeedback_copy.py/iter_15230_hf
81
+ ```
82
+
83
+ This will convert the XTuner's ckpt to the HuggingFace format.
84
+
85
+ Note: Since the Reward Model type is not integrated into the official transformers library, only the Reward Models trained with InternLM2 will be converted to the `InternLM2ForRewardModel` type. Other models will default to the `SequenceClassification` type (for example, LLaMa3 will be converted to the `LlamaForSequenceClassification` type).
data/xtuner/docs/en/switch_language.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ ## <a href='https://xtuner.readthedocs.io/en/latest/'>English</a>
2
+
3
+ ## <a href='https://xtuner.readthedocs.io/zh_CN/latest/'>简体中文</a>
data/xtuner/docs/en/training/custom_agent_dataset.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Custom Agent Dataset
2
+ ====================
data/xtuner/docs/en/training/custom_pretrain_dataset.rst ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Custom Pretrain Dataset
2
+ =======================