Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,8 @@ For a coffee chat or if you have any questions, please do not hesitate to contac
|
|
17 |
|
18 |
I would like to thank Allganize Korea for their generosity in providing resources for this personal project. This project is not directly related to the company's goals or research.
|
19 |
## TODO
|
20 |
-
- Complete training with korean_textbooks - 6B tokens down, 2B to go.
|
21 |
- More training with publicly available Korean corpora
|
22 |
-
- Instruct tuning
|
23 |
## **What is Mamba?**
|
24 |
Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
|
25 |
## **License**
|
@@ -33,7 +32,7 @@ Jisoo Kim(kuotient)
|
|
33 |
### KoBEST
|
34 |
| Model | boolq | copa | hellaswag | sentineg |
|
35 |
| --- | --- | --- | --- | --- |
|
36 |
-
| kuotient/mamba-ko-2.8b
|
37 |
| state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
|
38 |
| kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
|
39 |
| kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
|
@@ -41,7 +40,6 @@ Jisoo Kim(kuotient)
|
|
41 |
| maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
|
42 |
| microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
|
43 |
| TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
|
44 |
-
*>6B tokens trained. Further up to 8B tokens.
|
45 |
### Thanks
|
46 |
한국어 LLM 커뮤니티에 많은 기여와 동기부여를 해주고 계신 [maywell](https://huggingface.co/maywell)님 감사드립니다.
|
47 |
## Usage
|
@@ -55,7 +53,7 @@ from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
|
|
55 |
|
56 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
57 |
|
58 |
-
model_name = "kuotient/mamba-2.8b
|
59 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
60 |
tokenizer.pad_token = tokenizer.eos_token
|
61 |
|
|
|
17 |
|
18 |
I would like to thank Allganize Korea for their generosity in providing resources for this personal project. This project is not directly related to the company's goals or research.
|
19 |
## TODO
|
|
|
20 |
- More training with publicly available Korean corpora
|
21 |
+
- 🟡 Instruct tuning
|
22 |
## **What is Mamba?**
|
23 |
Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
|
24 |
## **License**
|
|
|
32 |
### KoBEST
|
33 |
| Model | boolq | copa | hellaswag | sentineg |
|
34 |
| --- | --- | --- | --- | --- |
|
35 |
+
| kuotient/mamba-ko-2.8b | 0.6213 | 0.6150 | 0.4014 | 0.3383 |
|
36 |
| state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
|
37 |
| kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
|
38 |
| kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
|
|
|
40 |
| maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
|
41 |
| microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
|
42 |
| TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
|
|
|
43 |
### Thanks
|
44 |
한국어 LLM 커뮤니티에 많은 기여와 동기부여를 해주고 계신 [maywell](https://huggingface.co/maywell)님 감사드립니다.
|
45 |
## Usage
|
|
|
53 |
|
54 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
55 |
|
56 |
+
model_name = "kuotient/mamba-ko-2.8b"
|
57 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
58 |
tokenizer.pad_token = tokenizer.eos_token
|
59 |
|