Yuanshi commited on
Commit
8c936a5
·
1 Parent(s): 6ed1db6

add app.py

Browse files
Files changed (2) hide show
  1. README.md +10 -127
  2. app.py +93 -0
README.md CHANGED
@@ -1,127 +1,10 @@
1
- # OminiControl
2
-
3
-
4
- <img src='./assets/demo/demo_this_is_omini_control.jpg' width='100%' />
5
- <br>
6
-
7
- <a href="https://arxiv.org/abs/2411.15098"><img src="https://img.shields.io/badge/ariXv-2411.15098-A42C25.svg" alt="arXiv"></a>
8
- <a href="https://huggingface.co/Yuanshi/OminiControl"><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
9
- <a href="https://github.com/Yuanshi9815/Subjects200K"><img src="https://img.shields.io/badge/GitHub-Subjects200K dataset-blue.svg?logo=github&" alt="GitHub"></a>
10
-
11
- > **OminiControl: Minimal and Universal Control for Diffuison Transformer**
12
- > <br>
13
- > Zhenxiong Tan,
14
- > [Songhua Liu](http://121.37.94.87/),
15
- > [Xingyi Yang](https://adamdad.github.io/),
16
- > Qiaochu Xue,
17
- > and
18
- > [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
19
- > <br>
20
- > [Learning and Vision Lab](http://lv-nus.org/), National University of Singapore
21
- > <br>
22
-
23
-
24
- ## Features
25
-
26
- OmniControl is a minimal yet powerful universal control framework for Diffusion Transformer models like [FLUX](https://github.com/black-forest-labs/flux).
27
-
28
- * **Universal Control 🌐**: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
29
-
30
- * **Minimal Design 🚀**: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
31
-
32
- ## Quick Start
33
- ### Setup (Optional)
34
- 1. **Environment setup**
35
- ```bash
36
- conda create -n omini python=3.10
37
- conda activate omini
38
- ```
39
- 2. **Requirements installation**
40
- ```bash
41
- pip install -r requirements.txt
42
- ```
43
- ### Usage example
44
- 1. Subject-driven generation: `examples/subject.ipynb`
45
- 2. In-painting: `examples/inpainting.ipynb`
46
- 3. Canny edge to image, depth to image, colorization, deblurring: `examples/spatial.ipynb`
47
-
48
- ## Generated samples
49
- ### Subject-driven generation
50
- **Demos** (Left: condition image; Right: generated image)
51
-
52
- <div float="left">
53
- <img src='./assets/demo/oranges_omini.jpg' width='48%'/>
54
- <img src='./assets/demo/rc_car_omini.jpg' width='48%' />
55
- <img src='./assets/demo/clock_omini.jpg' width='48%' />
56
- <img src='./assets/demo/shirt_omini.jpg' width='48%' />
57
- </div>
58
-
59
- <details>
60
- <summary>Text Prompts</summary>
61
-
62
- - Prompt1: *A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'*
63
- - Prompt2: *A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.*
64
- - Prompt3: *In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.*
65
- - Prompt4: *In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.*
66
- </details>
67
- <details>
68
- <summary>More results</summary>
69
-
70
- * Try on:
71
- <img src='./assets/demo/try_on.jpg'/>
72
- * Scene variations:
73
- <img src='./assets/demo/scene_variation.jpg'/>
74
- * Dreambooth dataset:
75
- <img src='./assets/demo/dreambooth_res.jpg'/>
76
- </details>
77
-
78
- ### Spaitally aligned control
79
- 1. **Image Inpainting** (Left: original image; Center: masked image; Right: filled image)
80
- - Prompt: *The Mona Lisa is wearing a white VR headset with 'Omini' written on it.*
81
- </br>
82
- <img src='./assets/demo/monalisa_omini.jpg' width='700px' />
83
- - Prompt: *A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.*
84
- </br>
85
- <img src='./assets/demo/book_omini.jpg' width='700px' />
86
- 2. **Other spatially aligned tasks** (Canny edge to image, depth to image, colorization, deblurring)
87
- </br>
88
- <details>
89
- <summary>Click to show</summary>
90
- <div float="left">
91
- <img src='./assets/demo/room_corner_canny.jpg' width='48%'/>
92
- <img src='./assets/demo/room_corner_depth.jpg' width='48%' />
93
- <img src='./assets/demo/room_corner_coloring.jpg' width='48%' />
94
- <img src='./assets/demo/room_corner_deblurring.jpg' width='48%' />
95
- </div>
96
-
97
- Prompt: *A light gray sofa stands against a white wall, featuring a black and white geometric patterned pillow. A white side table sits next to the sofa, topped with a white adjustable desk lamp and some books. Dark hardwood flooring contrasts with the pale walls and furniture.*
98
- </details>
99
-
100
-
101
-
102
-
103
- ## Models
104
-
105
- **Subject-driven control:**
106
- | Model | Base model | Description | Resolution |
107
- | ------------------------------------------------------------------------------------------------ | -------------- | -------------------------------------------------------------------------------------------------------- | ------------ |
108
- | [`experimental`](https://huggingface.co/Yuanshi/OminiControl/tree/main/experimental) / `subject` | FLUX.1-schnell | The model used in the paper. | (512, 512) |
109
- | [`omini`](https://huggingface.co/Yuanshi/OminiControl/tree/main/omini) / `subject_512` | FLUX.1-schnell | The model has been fine-tuned on a larger dataset. | (512, 512) |
110
- | [`omini`](https://huggingface.co/Yuanshi/OminiControl/tree/main/omini) / `subject_1024` | FLUX.1-schnell | The model has been fine-tuned on a larger dataset and accommodates higher resolution. (To be released) | (1024, 1024) |
111
-
112
- **Spatial aligned control:**
113
- | Model | Base model | Description | Resolution |
114
- | --------------------------------------------------------------------------------------------------------- | ---------- | -------------------------------------------------------------------------- | ------------ |
115
- | [`experimental`](https://huggingface.co/Yuanshi/OminiControl/tree/main/experimental) / `<task_name>` | FLUX.1 | Canny edge to image, depth to image, colorization, deblurring, in-painting | (512, 512) |
116
- | [`experimental`](https://huggingface.co/Yuanshi/OminiControl/tree/main/experimental) / `<task_name>_1024` | FLUX.1 | Supports higher resolution.(To be released) | (1024, 1024) |
117
-
118
- ## Citation
119
- ```
120
- @article{
121
- tan2024omini,
122
- title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
123
- author={Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang},
124
- journal={arXiv preprint arXiv:2411.15098},
125
- year={2024}
126
- }
127
- ```
 
1
+ ---
2
+ title: OminiControl
3
+ emoji: 🌍
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.6.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from PIL import Image, ImageDraw, ImageFont
4
+ from src.condition import Condition
5
+ from diffusers.pipelines import FluxPipeline
6
+ import numpy as np
7
+
8
+ from src.generate import seed_everything, generate
9
+
10
+ pipe = None
11
+
12
+
13
+ def init_pipeline():
14
+ global pipe
15
+ pipe = FluxPipeline.from_pretrained(
16
+ "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16
17
+ )
18
+ pipe = pipe.to("cuda")
19
+ pipe.load_lora_weights(
20
+ "Yuanshi/OminiControl",
21
+ weight_name=f"omini/subject_512.safetensors",
22
+ adapter_name="subject",
23
+ )
24
+
25
+
26
+ def process_image_and_text(image, text):
27
+ # center crop image
28
+ w, h, min_size = image.size[0], image.size[1], min(image.size)
29
+ image = image.crop(
30
+ (
31
+ (w - min_size) // 2,
32
+ (h - min_size) // 2,
33
+ (w + min_size) // 2,
34
+ (h + min_size) // 2,
35
+ )
36
+ )
37
+ image = image.resize((512, 512))
38
+
39
+ condition = Condition("subject", image)
40
+
41
+ if pipe is None:
42
+ init_pipeline()
43
+
44
+ result_img = generate(
45
+ pipe,
46
+ prompt=text.strip(),
47
+ conditions=[condition],
48
+ num_inference_steps=8,
49
+ height=512,
50
+ width=512,
51
+ ).images[0]
52
+
53
+ return result_img
54
+
55
+
56
+ def get_samples():
57
+ sample_list = [
58
+ {
59
+ "image": "assets/oranges.jpg",
60
+ "text": "A very close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!'",
61
+ },
62
+ {
63
+ "image": "assets/penguin.jpg",
64
+ "text": "On Christmas evening, on a crowded sidewalk, this item sits on the road, covered in snow and wearing a Christmas hat, holding a sign that reads 'Omini Control!'",
65
+ },
66
+ {
67
+ "image": "assets/rc_car.jpg",
68
+ "text": "A film style shot. On the moon, this item drives across the moon surface. The background is that Earth looms large in the foreground.",
69
+ },
70
+ {
71
+ "image": "assets/clock.jpg",
72
+ "text": "In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.",
73
+ },
74
+ ]
75
+ return [[Image.open(sample["image"]), sample["text"]] for sample in sample_list]
76
+
77
+
78
+ demo = gr.Interface(
79
+ fn=process_image_and_text,
80
+ inputs=[
81
+ gr.Image(type="pil"),
82
+ gr.Textbox(lines=2),
83
+ ],
84
+ outputs=gr.Image(type="pil"),
85
+ title="OminiControl / Subject driven generation",
86
+ examples=get_samples(),
87
+ )
88
+
89
+ if __name__ == "__main__":
90
+ init_pipeline()
91
+ demo.launch(
92
+ debug=True,
93
+ )