JCTN commited on
Commit
60f096c
·
1 Parent(s): 57b4a07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md CHANGED
@@ -1,3 +1,104 @@
1
  ---
2
  license: creativeml-openrail-m
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
  ---
4
+ base_model: runwayml/stable-diffusion-v1-5
5
+ tags:
6
+ - stable-diffusion
7
+ - stable-diffusion-diffusers
8
+ - diffusers
9
+ - controlnet
10
+ - jax-diffusers-event
11
+ - image-to-image
12
+ inference: true
13
+ datasets:
14
+ - mfidabel/sam-coyo-2k
15
+ - mfidabel/sam-coyo-2.5k
16
+ - mfidabel/sam-coyo-3k
17
+ language:
18
+ - en
19
+ library_name: diffusers
20
+ ---
21
+
22
+ # ControlNet - mfidabel/controlnet-segment-anything
23
+
24
+ These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.
25
+
26
+ **prompt**: contemporary living room of a house
27
+
28
+ **negative prompt**: low quality
29
+ ![images_0)](./images_0.png)
30
+
31
+ **prompt**: new york buildings, Vincent Van Gogh starry night
32
+
33
+ **negative prompt**: low quality, monochrome
34
+ ![images_1)](./images_1.png)
35
+
36
+ **prompt**: contemporary living room, high quality, 4k, realistic
37
+
38
+ **negative prompt**: low quality, monochrome, low res
39
+ ![images_2)](./images_2.png)
40
+
41
+
42
+ ## Model Details
43
+
44
+ - **Model type**: Diffusion-based text-to-image generation model with ControlNet conditioning
45
+
46
+ - **Language(s)**: English
47
+
48
+ - **License**: The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
49
+
50
+ - **Model Description**: This model is used to generate images based on a text prompt and a segmentation map as a template for the generated images
51
+
52
+
53
+ ## Limitations and Bias
54
+
55
+ - The model can't render text
56
+ - Landscapes with fewer segments tend to render better
57
+ - Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
58
+ - Some generated images can be over saturated
59
+ - Shorter prompts usually work better, as long as it makes sense with the input segmentation map
60
+ - The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset
61
+
62
+ ## Training
63
+
64
+ **Training Data** This model was trained using a Segmented dataset based on the [COYO-700M Dataset](https://huggingface.co/datasets/kakaobrain/coyo-700m).
65
+ [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint was used as the base model for the controlnet.
66
+
67
+ You can obtain the Segmentation Map of any Image through this Colab: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mfidabel/JAX_SPRINT_2023/blob/main/Segment_Anything_JAX_SPRINT.ipynb)
68
+
69
+ The model was trained as follows:
70
+
71
+ - 25k steps with the [SAM-COYO-2k](https://huggingface.co/datasets/mfidabel/sam-coyo-2k) dataset
72
+ - 28k steps with the [SAM-COYO-2.5k](https://huggingface.co/datasets/mfidabel/sam-coyo-2.5k) dataset
73
+ - 38k steps with the [SAM-COYO-3k](https://huggingface.co/datasets/mfidabel/sam-coyo-3k) dataset
74
+
75
+ In that particular order.
76
+
77
+ **Training Details**
78
+
79
+ - **Hardware**: Google Cloud TPUv4-8 VM
80
+
81
+ - **Optimizer**: AdamW
82
+
83
+ - **Train Batch Size**: 2 x 4 = 8
84
+
85
+ - **Learning rate**: 0.00001 constant
86
+
87
+ - **Gradient Accumulation Steps**: 1
88
+
89
+ - **Resolution**: 512
90
+
91
+ **Environmental Impact**
92
+
93
+ Based on the [Machine Learning Emissions Calculator](https://mlco2.github.io/impact#compute) with the following characteristics:
94
+
95
+ - **Hardware Type**: TPUv3 Chip (TPUv4 wasn't available yet at the time of calculating)
96
+ - **Training Hours**: 8 hours
97
+ - **Cloud Provider**: Google Cloud Platform
98
+ - **Compute Region**: us-central1
99
+ - **Carbon Emitted (Power consumption x Time x Carbon Produced Based on the Local Power Grid)**:
100
+ 283W x 8h = 2.26 kWh x 0.57 kg eq. CO2/kWh = 1.29 kg eq. CO2
101
+
102
+ ---
103
+ https://huggingface.co/mfidabel/controlnet-segment-anything
104
+