Safetensors
transformers_zamba2
zamba2
pglo commited on
Commit
eee85e1
·
verified ·
1 Parent(s): d9a6349

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -12,11 +12,13 @@ Zamba2-1.2B is a hybrid model composed of state-space and transformer blocks. It
12
 
13
  3.) We utilize rotary position embeddings in the shared attention layer.
14
 
15
- Zamba2-1.2B differs from our [2.7B model](https://huggingface.co/Zyphra/Zamba2-2.7B) in two ways:
16
 
17
  1.) Rotary position embeddings
18
 
19
- 2.) No alternating shared attention blocks
 
 
20
 
21
  We found that while hybrid SSM-transformer models are perfectly capable of performing well without position embeddings, adding rotary embeddings to the shared attention block slightly improved performance. Secondly, we utilize a single attention block instead of alternating because this enables a higher flop count for the model at a given parameter budget and at smaller scales this becomes more important than the slightly faster latency.
22
 
 
12
 
13
  3.) We utilize rotary position embeddings in the shared attention layer.
14
 
15
+ Zamba2-1.2B differs from our [2.7B model](https://huggingface.co/Zyphra/Zamba2-2.7B) in three ways:
16
 
17
  1.) Rotary position embeddings
18
 
19
+ 2.) No alternating shared transformer blocks
20
+
21
+ 3.) Added LoRA projectors to attention layers
22
 
23
  We found that while hybrid SSM-transformer models are perfectly capable of performing well without position embeddings, adding rotary embeddings to the shared attention block slightly improved performance. Secondly, we utilize a single attention block instead of alternating because this enables a higher flop count for the model at a given parameter budget and at smaller scales this becomes more important than the slightly faster latency.
24