Update README.md
Browse files
README.md
CHANGED
@@ -12,8 +12,10 @@ tags:
|
|
12 |
![Mamba-ko-2.8B](./Seagull-mamba.png)
|
13 |
**Mamba-ko-2.8B** is the state space model, further pretrained(or continous trained) with synthetically generated dataset - [**korean_textbooks**](https://huggingface.co/datasets/maywell/korean_textbooks).
|
14 |
|
15 |
-
If you're interested in building large-scale language models to solve a wide variety of problems in a wide variety of domains, you should consider joining [Allganize](https://allganize.career.greetinghr.com/o/65146).
|
16 |
For a coffee chat or if you have any questions, please do not hesitate to contact me as well! - [email protected]
|
|
|
|
|
17 |
## TODO
|
18 |
- Complete training with korean_textbooks - 6B tokens down, 2B to go.
|
19 |
- More training with publicly available Korean corpora
|
@@ -31,7 +33,7 @@ Jisoo Kim(kuotient)
|
|
31 |
### KoBEST
|
32 |
| Model | boolq | copa | hellaswag | sentinag |
|
33 |
| --- | --- | --- | --- | --- |
|
34 |
-
| kuotient/mamba-ko-2.8b | 0.5825 | 0.6166 | 0.4051 | 0.3383 |
|
35 |
| state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
|
36 |
| kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
|
37 |
| kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
|
@@ -39,7 +41,7 @@ Jisoo Kim(kuotient)
|
|
39 |
| maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
|
40 |
| microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
|
41 |
| TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
|
42 |
-
|
43 |
### Thanks
|
44 |
한국어 LLM 커뮤니티에 많은 기여와 동기부여를 해주고 계신 [maywell](https://huggingface.co/maywell)님 감사드립니다.
|
45 |
## Usage
|
|
|
12 |
![Mamba-ko-2.8B](./Seagull-mamba.png)
|
13 |
**Mamba-ko-2.8B** is the state space model, further pretrained(or continous trained) with synthetically generated dataset - [**korean_textbooks**](https://huggingface.co/datasets/maywell/korean_textbooks).
|
14 |
|
15 |
+
> If you're interested in building large-scale language models to solve a wide variety of problems in a wide variety of domains, you should consider joining [Allganize](https://allganize.career.greetinghr.com/o/65146).
|
16 |
For a coffee chat or if you have any questions, please do not hesitate to contact me as well! - [email protected]
|
17 |
+
|
18 |
+
I would like to thank Allganize Korea for their generosity in providing resources for this personal project. This project is not directly related to the company's goals or research.
|
19 |
## TODO
|
20 |
- Complete training with korean_textbooks - 6B tokens down, 2B to go.
|
21 |
- More training with publicly available Korean corpora
|
|
|
33 |
### KoBEST
|
34 |
| Model | boolq | copa | hellaswag | sentinag |
|
35 |
| --- | --- | --- | --- | --- |
|
36 |
+
| kuotient/mamba-ko-2.8b* | 0.5825 | 0.6166 | 0.4051 | 0.3383 |
|
37 |
| state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
|
38 |
| kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
|
39 |
| kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
|
|
|
41 |
| maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
|
42 |
| microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
|
43 |
| TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
|
44 |
+
*>6B tokens trained. Further up to 8B tokens.
|
45 |
### Thanks
|
46 |
한국어 LLM 커뮤니티에 많은 기여와 동기부여를 해주고 계신 [maywell](https://huggingface.co/maywell)님 감사드립니다.
|
47 |
## Usage
|