ArthurZ HF staff commited on
Commit
79d8b05
·
verified ·
1 Parent(s): 03e6b8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -10,16 +10,66 @@ This repository contains the `transfromers` compatible `mamba-2.8b`. The checkpo
10
 
11
  # Usage
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ```python
14
  from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
15
  import torch
16
 
17
  tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf")
18
- tokenizer.pad_token = tokenizer.eos_token
19
-
20
  model = MambaForCausalLM.from_pretrained("state-spaces/mamba-2.8b-hf")
21
- input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]
22
 
23
  out = model.generate(input_ids, max_new_tokens=10)
24
  print(tokenizer.batch_decode(out))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ```
 
10
 
11
  # Usage
12
 
13
+ You need to install `transformers` from `main` until `transformers=4.39.0` is released.
14
+ ```bash
15
+ pip install git+https://github.com/huggingface/transformers@main
16
+ ```
17
+
18
+ We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:
19
+
20
+ ```bash
21
+ pip install causal-conv1d>=1.2.0
22
+ pip install mamba-ssm
23
+ ```
24
+
25
+ If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised `cuda` kernels will be used.
26
+
27
+ ## Generation
28
+ You can use the classic `generate` API:
29
  ```python
30
  from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
31
  import torch
32
 
33
  tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf")
 
 
34
  model = MambaForCausalLM.from_pretrained("state-spaces/mamba-2.8b-hf")
35
+ input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"]
36
 
37
  out = model.generate(input_ids, max_new_tokens=10)
38
  print(tokenizer.batch_decode(out))
39
+ ```
40
+
41
+ ## PEFT finetuning example
42
+ In order to finetune using the `peft` library, we recommend keeping the model in float32!
43
+
44
+ ```python
45
+ from datasets import load_dataset
46
+ from trl import SFTTrainer
47
+ from peft import LoraConfig
48
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
49
+ tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf", pad_token ="<s>")
50
+ model = AutoModelForCausalLM.from_pretrained("state-spaces/mamba-2.8b-hf")
51
+ dataset = load_dataset("Abirate/english_quotes", split="train")
52
+ training_args = TrainingArguments(
53
+ output_dir="./results",
54
+ num_train_epochs=3,
55
+ per_device_train_batch_size=4,
56
+ logging_dir='./logs',
57
+ logging_steps=10,
58
+ learning_rate=2e-3
59
+ )
60
+ lora_config = LoraConfig(
61
+ r=8,
62
+ target_modules="all-linear",
63
+ task_type="CAUSAL_LM",
64
+ bias="none"
65
+ )
66
+ trainer = SFTTrainer(
67
+ model=model,
68
+ tokenizer=tokenizer,
69
+ args=training_args,
70
+ peft_config=lora_config,
71
+ train_dataset=dataset,
72
+ dataset_text_field="quote",
73
+ )
74
+ trainer.train()
75
  ```