ldwang commited on
Commit
eda73a2
·
verified ·
1 Parent(s): eabe242

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Approach
2
+ This model of [Mamba architecture](https://arxiv.org/abs/2312.00752) has been pre-trained on approximately 400B tokens of Chinese and English corpora.
3
+
4
+ ## Usage
5
+ ```python
6
+ import torch
7
+
8
+ from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
9
+ from transformers import AutoTokenizer
10
+
11
+ repo_id = 'mamba-1.4b-aquila-400b'
12
+ device = f"cuda:0"
13
+ model = MambaLMHeadModel.from_pretrained(repo_id, dtype=torch.bfloat16, device=device)
14
+ model.eval()
15
+
16
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
17
+ prompt = "The Spring Festival is"
18
+ tokens = tokenizer.encode_plus(prompt, truncation=False)["input_ids"]
19
+ tokens = torch.tensor(tokens)[None,].to(device)
20
+ with torch.no_grad():
21
+ input_length = len(tokens[0])
22
+ out_ids = model.generate(input_ids=tokens, max_length=input_length+200, temperature=1.0, top_p=0.95, eos_token_id=tokenizer.eos_token_id, cg=True, top_k=15)
23
+ out_ids = out_ids[0][input_length:].cpu().numpy()
24
+ out_text = tokenizer.decode(out_ids.tolist())
25
+ print(out_text)
26
+ ```
27
+ > the most important festival of the year for the Chinese people. It usually comes in January or February and it takes about 15 days to prepare for it.
28
+
29
+ ## References
30
+
31
+ The Mamba architecture was introduced in [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752).
32
+
33
+ The official implementation is here: https://github.com/state-spaces/mamba/tree/main