Blackroot commited on
Commit
3308cfe
·
verified ·
1 Parent(s): ad4da18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -13,6 +13,8 @@ A modified tensor product attention with rope is used instead of regular MHA [Te
13
 
14
  xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
15
 
 
 
16
  ```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
17
 
18
  ```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.
 
13
 
14
  xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
15
 
16
+ This network was optimized via [Distributed Shampoo Github](https://github.com/facebookresearch/optimizers/blob/main/distributed_shampoo/README.md) || [Distributed Shampoo Paper](https://arxiv.org/abs/2309.06497)
17
+
18
  ```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
19
 
20
  ```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.