loiccabannes commited on
Commit
8d1df11
·
verified ·
1 Parent(s): eef4390

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - SkelterLabsInc/JaQuAD
5
+ language:
6
+ - ja
7
+ pipeline_tag: question-answering
8
+ ---
9
+
10
+ MambaSan-130m-instruct 🐍
11
+
12
+ **MambaSan-instruct is the first chat Japanese language model based on a state-space model architecture (Mamba), not a transformer.**
13
+
14
+ The model is based on Albert Gu's and Tri Dao's work *Mamba: Linear-Time Sequence Modeling with Selective State Spaces* ([paper](https://arxiv.org/pdf/2312.00752.pdf)) as well as their [model implementation](https://github.com/state-spaces/mamba).
15
+ This work was also inspired by heavenq's mamba-chat implementation in English:
16
+ bibtex
17
+ @misc{haven2023mambachat,
18
+ title = {Mamba-Chat},
19
+ author = {Justus Mattern and Konstantin Hohr},
20
+ year = {2023},
21
+ howpublished = {GitHub},
22
+ url = {https://github.com/havenhq/mamba-chat}
23
+ }
24
+ This repository provides training / fine-tuning code for the model based on some modifications of the Huggingface Trainer class.
25
+
26
+ Mamba-Chat is based on MambaSan-130m and was fine-tuned on 31,7k examples samples of the [SkelterLabsInc/JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset. To learn more, you can:
27
+
28
+ - Take a look at the model on [Huggingface](https://huggingface.co/loic_cabannes/MambaSan-instruct) 🤗
29
+ - Talk to Mamba-Chat on [Google Colab](https://colab.research.google.com/drive/1LDP9Wik98SSpoTlxAjKOmhRrcuk_OpT_?usp=sharing)
30
+
31
+
32
+ The Code used for pretraining and finetuning will soon be published on my github: https://github.com/lcabannes
33
+ <br>
34
+
35
+
36
+ ## Citation
37
+
38
+ ```
39
+ bibtex
40
+ @misc{lcabannes2024MambaSan-130m-instruct,
41
+ title = {MambaSan-130-instruct},
42
+ author = {Loïc Cabannes},
43
+ year = {2024},
44
+ howpublished = {HuggingFace},
45
+ url = {https://huggingface.co/loiccabannes/MambaSan-130m-instruct/}
46
+ }
47
+ ```