Blackroot
/

SimpleDiffusion-TensorProductAttentionRope

Model card Files Files and versions Community

Blackroot commited on 13 days ago

Commit

3308cfe

·

verified ·

1 Parent(s): ad4da18

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ A modified tensor product attention with rope is used instead of regular MHA [Te
 xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
 ```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
 ```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.

 xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
+This network was optimized via [Distributed Shampoo Github](https://github.com/facebookresearch/optimizers/blob/main/distributed_shampoo/README.md) || [Distributed Shampoo Paper](https://arxiv.org/abs/2309.06497)
 ```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
 ```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.