File size: 5,849 Bytes
e2f434b 43ff161 cfd7956 e2f434b 76a162e 1be8918 e2f434b 1be8918 5f9c77b 36b2a97 e2f434b 1b47ea4 e2f434b 1b47ea4 e2f434b 1b47ea4 280b539 36b2a97 9a76c9a 76a162e 6485505 76a162e 1b47ea4 76a162e 1b47ea4 76a162e 6485505 8e0f973 76a162e 1b47ea4 76a162e 6485505 76a162e f32dcdf 76a162e e2f434b 8e0f973 e2f434b 36b2a97 d85536f e2f434b d85536f e2f434b 593d1eb da0559e bf17ec9 f51cc64 2dc00e8 f51cc64 f3687a8 e2f434b d950e2a e2f434b 36b2a97 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
language:
- en
tags:
- upstage
- llama-2
- instruct
- instruction
pipeline_tag: text-generation
---
# Updates
Solar, a new bot created by Upstage, is now available on **Poe**. As a top-ranked model on the HuggingFace Open LLM leaderboard, and a fine tune of Llama 2, Solar is a great example of the progress enabled by open source.
Try now at https://poe.com/Solar-0-70b
# SOLAR-0-70b-16bit model card
The model name has been changed from LLaMa-2-70b-instruct-v2 to SOLAR-0-70b-16bit
## Model Details
* **Developed by**: [Upstage](https://en.upstage.ai)
* **Backbone Model**: [LLaMA-2](https://github.com/facebookresearch/llama/tree/main)
* **Language(s)**: English
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
* **License**: Fine-tuned checkpoints is licensed under the Non-Commercial Creative Commons license ([CC BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/))
* **Where to send comments**: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the [Hugging Face community's model repository](https://huggingface.co/upstage/Llama-2-70b-instruct-v2/discussions)
* **Contact**: For questions and comments about the model, please email [[email protected]](mailto:[email protected])
## Dataset Details
### Used Datasets
- Orca-style dataset
- Alpaca-style dataset
- No other dataset was used except for the dataset mentioned above
- No benchmark test set or the training set are used
### Prompt Template
```
### System:
{System}
### User:
{User}
### Assistant:
{Assistant}
```
## Usage
- The followings are tested on A100 80GB
- Our model can handle up to 10k+ input tokens, thanks to the `rope_scaling` option
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("upstage/Llama-2-70b-instruct-v2")
model = AutoModelForCausalLM.from_pretrained(
"upstage/Llama-2-70b-instruct-v2",
device_map="auto",
torch_dtype=torch.float16,
load_in_8bit=True,
rope_scaling={"type": "dynamic", "factor": 2} # allows handling of longer inputs
)
prompt = "### User:\nThomas is healthy, but he has to go to the hospital. What could be the reasons?\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
del inputs["token_type_ids"]
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=float('inf'))
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
```
## Hardware and Software
* **Hardware**: We utilized an A100x8 * 4 for training our model
* **Training Factors**: We fine-tuned this model using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) / [HuggingFace Accelerate](https://huggingface.co/docs/accelerate/index)
## Evaluation Results
### Overview
- We conducted a performance evaluation following the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
We evaluated our model on four benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU`, and `TruthfulQA`
We used the [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness), specifically commit [b281b0921b636bc36ad05c0b0b0763bd6dd43463](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463).
- We used [MT-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge), a set of challenging multi-turn open-ended questions, to evaluate the models
### Main Results
| Model | H4(Avg) | ARC | HellaSwag | MMLU | TruthfulQA | | MT_Bench |
|--------------------------------------------------------------------|----------|----------|----------|------|----------|-|-------------|
| **[Llama-2-70b-instruct-v2](https://huggingface.co/upstage/Llama-2-70b-instruct-v2)**(***Ours***, ***Open LLM Leaderboard***) | **73** | **71.1** | **87.9** | **70.6** | **62.2** | | **7.44063** |
| [Llama-2-70b-instruct](https://huggingface.co/upstage/Llama-2-70b-instruct) (Ours, Open LLM Leaderboard) | 72.3 | 70.9 | 87.5 | 69.8 | 61 | | 7.24375 |
| [llama-65b-instruct](https://huggingface.co/upstage/llama-65b-instruct) (Ours, Open LLM Leaderboard) | 69.4 | 67.6 | 86.5 | 64.9 | 58.8 | | |
| Llama-2-70b-hf | 67.3 | 67.3 | 87.3 | 69.8 | 44.9 | | |
| [llama-30b-instruct-2048](https://huggingface.co/upstage/llama-30b-instruct-2048) (Ours, Open LLM Leaderboard) | 67.0 | 64.9 | 84.9 | 61.9 | 56.3 | | |
| [llama-30b-instruct](https://huggingface.co/upstage/llama-30b-instruct) (Ours, Open LLM Leaderboard) | 65.2 | 62.5 | 86.2 | 59.4 | 52.8 | | |
| llama-65b | 64.2 | 63.5 | 86.1 | 63.9 | 43.4 | | |
| falcon-40b-instruct | 63.4 | 61.6 | 84.3 | 55.4 | 52.5 | | |
### Scripts for H4 Score Reproduction
- Prepare evaluation environments:
```
# clone the repository
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
# check out the specific commit
git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463
# change to the repository directory
cd lm-evaluation-harness
```
## Contact Us
### About Upstage
- [Upstage](https://en.upstage.ai) is a company specialized in Large Language Models (LLMs) and AI. We will help you build private LLMs and related applications.
If you have a dataset to build domain specific LLMs or make LLM applications, please contact us at ► [click here to contact](https://www.upstage.ai/private-llm?utm_source=huggingface&utm_medium=link&utm_campaign=privatellm)
- As of August 1st, our 70B model has reached the top spot in openLLM rankings, marking itself as the current leading performer globally. |