Update README.md
Browse files
README.md
CHANGED
@@ -45,6 +45,14 @@ outputs = model.generate(**input_ids, max_new_tokens=100)
|
|
45 |
print(tokenizer.decode(outputs[0]))
|
46 |
```
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
## Model Details
|
49 |
|
50 |
Zamba utilizes a unique hybrid SSM architecture. This architecture consists of a backbone of Mamba layers interspersed with a shared attention layer. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth.
|
|
|
45 |
print(tokenizer.decode(outputs[0]))
|
46 |
```
|
47 |
|
48 |
+
To load a different checkpoint use, e.g., for iteration 2500,
|
49 |
+
|
50 |
+
```python
|
51 |
+
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1", device_map="auto", torch_dtype=torch.bfloat16, revision="iter2500")
|
52 |
+
```
|
53 |
+
|
54 |
+
The default iteration is the fully trained model, corresponding to iteration 25156. This is the number of training iterations done starting from Zamba-phase 1 [Zyphra/Zamba-7B-v1-phase1](https://huggingface.co/Zyphra/Zamba-7B-v1-phase1). See [arXiv:2405.16712](https://arxiv.org/abs/2405.16712) for more details on training.
|
55 |
+
|
56 |
## Model Details
|
57 |
|
58 |
Zamba utilizes a unique hybrid SSM architecture. This architecture consists of a backbone of Mamba layers interspersed with a shared attention layer. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth.
|