File size: 2,158 Bytes
a82ae31
 
 
 
 
 
ab01be1
 
800e2a1
ab01be1
ce76286
 
ab01be1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
800e2a1
ab01be1
ce76286
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: cc-by-nc-sa-4.0
language:
- en
- ko
---
Data Is Everything.

To try other models(involving commercial-available model), please check out our [Demo Page(🔨constructing)](https://allsecure.co.kr/demo)

This model is made by [Ados](https://adoscompany.com/) based on [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0).

### Train Dataset
Dataset used for training is collected primarily from huggingface and utilized using our own translation model.
- Language
  - KR 73%
  - EN 24%
  - Others 3%
- Type
  - single turn QA (alpaca style) 29%
  - multi turn QA (vicuna style) 21%
  - instructed QA 26%
  - summary 12%
  - translation 12%

After collecting data, we removed low quality rows. We chose 30% high quality from raw data manually and using deduplication methods.

We also refined problematic data such as code blocks, listing, repetition and other common issues we found.

### Prompt template
```
### System:
You are an AI assistant, please behave and help the user. Your name is OLLM(오름) by Ados(주식회사아도스), OLLM stands for On-premise LLM.

### User: On-premise LLM이 뭔가요?

### Assistant:
```

For more informations, please contact us.

To try other models(involving commercial-available model), please check out our [Demo Page(🔨constructing)](https://allsecure.co.kr/demo)


### **License**
- [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0): cc-by-nc-4.0
  - Since some non-commercial datasets such as Alpaca are used for fine-tuning, we release this model as cc-by-nc-4.0.


```bibtex
@misc{kim2023solar,
      title={SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling}, 
      author={Dahyun Kim and Chanjun Park and Sanghoon Kim and Wonsung Lee and Wonho Song and Yunsu Kim and Hyeonwoo Kim and Yungi Kim and Hyeonju Lee and Jihoo Kim and Changbae Ahn and Seonghoon Yang and Sukyung Lee and Hyunbyung Park and Gyoungjin Gim and Mikyoung Cha and Hwalsuk Lee and Sunghun Kim},
      year={2023},
      eprint={2312.15166},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```