Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,34 @@
|
|
1 |
---
|
2 |
license: agpl-3.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: agpl-3.0
|
3 |
---
|
4 |
+
|
5 |
+
# SD1 Style Components (experimental)
|
6 |
+
|
7 |
+
Style control for Stable Diffusion 1.x anime models
|
8 |
+
|
9 |
+
## What is this?
|
10 |
+
|
11 |
+
It is IP-Adapter, but for (anime) styles. Instead of CLIP image embeddings, the image generation is conditioned on 30-dimensional style embeddings, which can either be extracted from an image(s) or created manually.
|
12 |
+
|
13 |
+
## Why?
|
14 |
+
|
15 |
+
Currently, the main means of style control is through artist tags. This method reasonably raises the concern of style plagiarism.
|
16 |
+
By breaking down styles into interpretable components that are present in all artists, direct copying of styles can be avoided.
|
17 |
+
Furthermore, new styles can be easily created by manipulating the magnitude of the style components, offering more controllability over stacking artist tags or LoRAs.
|
18 |
+
|
19 |
+
Additionally, this can be potentially useful for general purpose training, as training with style condition may weaken style leakage into concepts.
|
20 |
+
This also serves as a demonstration that image models can be conditioned on arbitrary tensors other than text or images.
|
21 |
+
Hopefully, more people can understand that it is not necessary to force conditions that are inherently numerical (aesthetic scores, dates, ...) into text form tags.
|
22 |
+
|
23 |
+
## How do I use it?
|
24 |
+
|
25 |
+
Currently, a [Colab notebook](https://colab.research.google.com/drive/1AKXiHHBAnzbtKyToN6WdzOov-niJudcL?usp=sharing) with a gradio interface is available.
|
26 |
+
As this is only an experimental preview, proper support for popular web UIs will not be added before more the models reach a stable state.
|
27 |
+
|
28 |
+
## Technical details
|
29 |
+
First, a style embedding model is created by Supervised Contrastive Learning on an [artists dataset](https://huggingface.co/datasets/gustproof/artists/blob/main/artists.zip).
|
30 |
+
Then, from the learned embeddings, the 30 first components of a PCA are extracted. Finally, a modified IP-Adapter is trained on anime-final-pruned using the same dataset with WD1.4 tags and the projected 30-d embeddings. The training resolution is 576*576 with variable aspect ratios.
|
31 |
+
|
32 |
+
|
33 |
+
## Acknowledgements
|
34 |
+
This is largely inspired by [Inserting Anybody in Diffusion Models via Celeb Basis](http://arxiv.org/abs/2306.00926) and [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter). Training and inference code is modified from IP-Adapter ([license](https://github.com/tencent-ailab/IP-Adapter/blob/main/LICENSE)).
|