loiccabannes
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- SkelterLabsInc/JaQuAD
|
5 |
+
language:
|
6 |
+
- ja
|
7 |
+
pipeline_tag: question-answering
|
8 |
+
---
|
9 |
+
|
10 |
+
MambaSan-130m-instruct 🐍
|
11 |
+
|
12 |
+
**MambaSan-instruct is the first chat Japanese language model based on a state-space model architecture (Mamba), not a transformer.**
|
13 |
+
|
14 |
+
The model is based on Albert Gu's and Tri Dao's work *Mamba: Linear-Time Sequence Modeling with Selective State Spaces* ([paper](https://arxiv.org/pdf/2312.00752.pdf)) as well as their [model implementation](https://github.com/state-spaces/mamba).
|
15 |
+
This work was also inspired by heavenq's mamba-chat implementation in English:
|
16 |
+
bibtex
|
17 |
+
@misc{haven2023mambachat,
|
18 |
+
title = {Mamba-Chat},
|
19 |
+
author = {Justus Mattern and Konstantin Hohr},
|
20 |
+
year = {2023},
|
21 |
+
howpublished = {GitHub},
|
22 |
+
url = {https://github.com/havenhq/mamba-chat}
|
23 |
+
}
|
24 |
+
This repository provides training / fine-tuning code for the model based on some modifications of the Huggingface Trainer class.
|
25 |
+
|
26 |
+
Mamba-Chat is based on MambaSan-130m and was fine-tuned on 31,7k examples samples of the [SkelterLabsInc/JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset. To learn more, you can:
|
27 |
+
|
28 |
+
- Take a look at the model on [Huggingface](https://huggingface.co/loic_cabannes/MambaSan-instruct) 🤗
|
29 |
+
- Talk to Mamba-Chat on [Google Colab](https://colab.research.google.com/drive/1LDP9Wik98SSpoTlxAjKOmhRrcuk_OpT_?usp=sharing)
|
30 |
+
|
31 |
+
|
32 |
+
The Code used for pretraining and finetuning will soon be published on my github: https://github.com/lcabannes
|
33 |
+
<br>
|
34 |
+
|
35 |
+
|
36 |
+
## Citation
|
37 |
+
|
38 |
+
```
|
39 |
+
bibtex
|
40 |
+
@misc{lcabannes2024MambaSan-130m-instruct,
|
41 |
+
title = {MambaSan-130-instruct},
|
42 |
+
author = {Loïc Cabannes},
|
43 |
+
year = {2024},
|
44 |
+
howpublished = {HuggingFace},
|
45 |
+
url = {https://huggingface.co/loiccabannes/MambaSan-130m-instruct/}
|
46 |
+
}
|
47 |
+
```
|