Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,8 @@ A modified tensor product attention with rope is used instead of regular MHA [Te
|
|
13 |
|
14 |
xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
|
15 |
|
|
|
|
|
16 |
```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
|
17 |
|
18 |
```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.
|
|
|
13 |
|
14 |
xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
|
15 |
|
16 |
+
This network was optimized via [Distributed Shampoo Github](https://github.com/facebookresearch/optimizers/blob/main/distributed_shampoo/README.md) || [Distributed Shampoo Paper](https://arxiv.org/abs/2309.06497)
|
17 |
+
|
18 |
```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
|
19 |
|
20 |
```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.
|