|
--- |
|
license: llama2 |
|
metrics: |
|
- code_eval |
|
library_name: transformers |
|
tags: |
|
- code |
|
model-index: |
|
- name: WizardCoder-Python-34B-V1.0 |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
type: openai_humaneval |
|
name: HumanEval |
|
metrics: |
|
- name: pass@1 |
|
type: pass@1 |
|
value: 0.732 |
|
verified: false |
|
--- |
|
|
|
<p align="center"> |
|
π€ <a href="https://huggingface.co/WizardLM" target="_blank">HF Repo</a> β’π± <a href="https://github.com/nlpxucan/WizardLM" target="_blank">Github Repo</a> β’ π¦ <a href="https://twitter.com/WizardLM_AI" target="_blank">Twitter</a> β’ π <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> β’ π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> β’ π <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a> <br> |
|
</p> |
|
<p align="center"> |
|
π Join our <a href="https://discord.gg/VZjjHtWrKs" target="_blank">Discord</a> |
|
</p> |
|
|
|
## News |
|
|
|
- π₯π₯π₯[2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval). |
|
- [2023/06/16] We released **WizardCoder-15B-V1.0** , which achieves the **57.3 pass@1** and surpasses **Claude-Plus (+6.8)**, **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the [HumanEval Benchmarks](https://github.com/openai/human-eval). |
|
|
|
βNote: There are two HumanEval results of GPT4 and ChatGPT-3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of [OpenAI](https://arxiv.org/abs/2303.08774). The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26). |
|
|
|
|
|
| Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License | |
|
| ----- |------| ---- |------|-------| ----- | ----- | |
|
| WizardCoder-Python-34B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 73.2 | 61.2 | [Demo](http://47.103.63.15:50085/) | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a> | |
|
| WizardCoder-15B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 59.8 |50.6 | -- | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> | |
|
| WizardCoder-Python-13B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 64.0 | 55.6 | -- | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a> | |
|
| WizardCoder-3B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardCoder-3B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 34.8 |37.4 | [Demo](http://47.103.63.15:50086/) | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> | |
|
| WizardCoder-1B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardCoder-1B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 23.8 |28.6 | -- | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> | |
|
|
|
|
|
- Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**. |
|
- Our **WizardMath-70B-V1.0** model achieves **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM, and achieves **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM. |
|
|
|
<font size=4> |
|
|
|
| Model | Checkpoint | Paper | GSM8k | MATH |Online Demo| License| |
|
| ----- |------| ---- |------|-------| ----- | ----- | |
|
| WizardMath-70B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **81.6** | **22.7** |[Demo](http://47.103.63.15:50083/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> | |
|
| WizardMath-13B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardMath-13B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **63.9** | **14.0** |[Demo](http://47.103.63.15:50082/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> | |
|
| WizardMath-7B-V1.0 | π€ <a href="https://huggingface.co/WizardLM/WizardMath-7B-V1.0" target="_blank">HF Link</a> | π <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **54.9** | **10.7** | [Demo ](http://47.103.63.15:50080/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a>| |
|
</font> |
|
|
|
|
|
- [08/09/2023] We released **WizardLM-70B-V1.0** model. Here is [Full Model Weight](https://huggingface.co/WizardLM/WizardLM-70B-V1.0). |
|
|
|
<font size=4> |
|
|
|
|
|
| <sup>Model</sup> | <sup>Checkpoint</sup> | <sup>Paper</sup> |<sup>MT-Bench</sup> | <sup>AlpacaEval</sup> | <sup>GSM8k</sup> | <sup>HumanEval</sup> | <sup>License</sup>| |
|
| ----- |------| ---- |------|-------| ----- | ----- | ----- | |
|
| <sup>**WizardLM-70B-V1.0**</sup> | <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-70B-V1.0" target="_blank">HF Link</a> </sup>|<sup>π**Coming Soon**</sup>| <sup>**7.78**</sup> | <sup>**92.91%**</sup> |<sup>**77.6%**</sup> | <sup> **50.6**</sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> | |
|
| <sup>WizardLM-13B-V1.2</sup> | <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.2" target="_blank">HF Link</a> </sup>| | <sup>7.06</sup> | <sup>89.17%</sup> |<sup>55.3%</sup> | <sup>36.6 </sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> | |
|
| <sup>WizardLM-13B-V1.1</sup> |<sup> π€ <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.1" target="_blank">HF Link</a> </sup> | | <sup>6.76</sup> |<sup>86.32%</sup> | | <sup>25.0 </sup>| <sup>Non-commercial</sup>| |
|
| <sup>WizardLM-30B-V1.0</sup> | <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-30B-V1.0" target="_blank">HF Link</a></sup> | | <sup>7.01</sup> | | | <sup>37.8 </sup>| <sup>Non-commercial</sup> | |
|
| <sup>WizardLM-13B-V1.0</sup> | <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.0" target="_blank">HF Link</a> </sup> | | <sup>6.35</sup> | <sup>75.31%</sup> | | <sup> 24.0 </sup> | <sup>Non-commercial</sup>| |
|
| <sup>WizardLM-7B-V1.0 </sup>| <sup>π€ <a href="https://huggingface.co/WizardLM/WizardLM-7B-V1.0" target="_blank">HF Link</a> </sup> |<sup> π <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> </sup>| | | |<sup>19.1 </sup>|<sup> Non-commercial</sup>| |
|
</font> |
|
|
|
|
|
## Comparing WizardCoder-Python-34B-V1.0 with Other LLMs. |
|
|
|
π₯ The following figure shows that our **WizardCoder-Python-34B-V1.0 attains the second position in this benchmark**, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2). |
|
|
|
<p align="center" width="100%"> |
|
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardCoder/imgs/compare_sota.png" alt="WizardCoder" style="width: 96%; min-width: 300px; display: block; margin: auto;"></a> |
|
</p> |
|
|
|
## Prompt Format |
|
``` |
|
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:" |
|
``` |
|
|
|
## Inference Demo Script |
|
|
|
We provide the inference demo code [here](https://github.com/nlpxucan/WizardLM/tree/main/demo). |
|
|
|
## Citation |
|
|
|
Please cite the repo if you use the data, method or code in this repo. |
|
|
|
``` |
|
@article{luo2023wizardcoder, |
|
title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct}, |
|
author={Luo, Ziyang and Xu, Can and Zhao, Pu and Sun, Qingfeng and Geng, Xiubo and Hu, Wenxiang and Tao, Chongyang and Ma, Jing and Lin, Qingwei and Jiang, Daxin}, |
|
journal={arXiv preprint arXiv:2306.08568}, |
|
year={2023} |
|
} |
|
``` |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_WizardLM__WizardCoder-Python-34B-V1.0) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 46.83 | |
|
| ARC (25-shot) | 52.13 | |
|
| HellaSwag (10-shot) | 74.78 | |
|
| MMLU (5-shot) | 49.15 | |
|
| TruthfulQA (0-shot) | 48.85 | |
|
| Winogrande (5-shot) | 68.35 | |
|
| GSM8K (5-shot) | 9.48 | |
|
| DROP (3-shot) | 25.06 | |
|
|