File size: 9,784 Bytes
fa81e6a 72fc9f9 fa81e6a 717dfea fa81e6a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
---
license: mit
pipeline_tag: mask-generation
library_name: refiners
tags:
- vision
- image-segmentation
- matting
- remove background
- background
- background-removal
- salient-object-detection
- PyTorch
- refiners
---
# Release note for Finegrain Box Segmenter v0.1
## Demo
If you want to give the Finegrain Box Segmenter a try, the best way to is take a look at the [Finegrain Object Cutter Space](https://huggingface.co/spaces/finegrain/finegrain-object-cutter) we shipped on Hugging Face: it's a fun "prompt to cut out" experience that will enable you to create pixel quality and high resolution cutouts for any object in a photo, by just naming the object.
## Motivation
While building Finegrain, we needed a way to create pixel perfect and high resolution cutouts for objects in images. We looked at off-the-shelf solutions, but they simply didn't work for us:
- On the one hand, traditional background removal models are great at producing HD cutouts, but unfortunately, different people will have different definitions for background and foreground in a given image - a way to prompt these models is missing.
- On the other hand, new promptable approaches like SAM or SAM2 don't meet the quality bar for the use cases we are pursuing: they are generating internally a 256x256 low resolution mask - with built-in upscaling mechanisms that create artefacts and struggle with complex masks (a la Eiffel Tower).
The Finegrain Box Segmenter avoids these pitfalls by training [MVANet](https://arxiv.org/abs/2404.07445) to be a box-promptable High Definition (1024x1024) object-cutout model, making no assumption on what is background and what is foreground: users are fully in control.
## License
The Finegrain Box Segmenter is published under the MIT license. Have fun using it in your projects! If you want an optimized version (speed and accuracy wise), we offer an API - just [ping us](mailto:[email protected])!
## Features
The Finegrain Box Segmenter:
- produces HD and pixel quality masks,
- gives control to users via box prompting,
- outputs alpha masks: you can use it as an end-to-end Matting Segmenter without any post-processing or trimap.
## Use cases
You should think of the Finegrain Box Segmenter as a way to select an object in a image, with pixel level accuracy, and in high resolution.
It's a prerequisite for a number of object manipulation tasks like:
- Remove the background around an object
- Change the background around an object
- Erase an object from an image
- Recolor an object in an image
- Replace an object in an image
- ...
Out-of-the box, the Finegrain Object Cutter requires a bounding box as an input, but you can easily augment it to enable "prompt to select object" scenarios - see the Finegrain Object Cutter Hugging Face space for an example implementation.
## Training
Our focus at Finegrain is e-commerce. We therefore trained our model with product datasets coming from 2 sources:
- **Nfinite**:
- 7769 images
- Synthetic data (3D)
- 14818 pixel quality masks
- Interior design items
- Open source
- **Finegrain**:
- 1184 images sourced via hard negative mining
- Natural data (both studio and UGC photos)
- 1479 pixel quality masks
- Common objects
- Closed source
We moved away from the usual random crop approach. Instead, we designed our custom cropping strategy to make sure the model understands what object to select in a given bounding box. We used batch sizes of 5 to improve the training stability.
## Evaluation
Given our focus on e-commerce, we crafted a specific test set, and in order to ease benchmarking with other models and solutions, we decided to open source part of it as the [Finegrain Product Masks Lite](https://huggingface.co/datasets/finegrain/finegrain-product-masks-lite), containing 120 pixel quality masks of common objects (both UGC and studio photos).
We're using the usual metrics, namely MAE, Smeasure, Emeasure and Dice, computed with [PySODMetrics](https://github.com/lartpang/PySODMetrics). We'll add more later to account for matting aspects (transparent objects) - still a work-in-progress on our end.
| Model | **MAE** ↓ | **Smeasure** ↑ | **Smeasure** ↑ | **Dice** ↑ |
|--|--|--|--|--|
| `briaai/RMBG-1.4` (x) | 0.0226 | 90.7% | 94.3% | 88.5% |
| `ZhengPeng7/BiRefNet` (xx) | 0.0194 | 93.1% | 95.1% | 91.5% |
| `finegrain/finegrain-box-segmenter` | 0.0078 | 97.4% | 98.5% | 96.7% |
(x) Using Cropping with 5% margin
(xx) Using "Segmentation With Box Guidance" from [BiRefNet](https://github.com/ZhengPeng7/BiRefNet?tab=readme-ov-file#model-zoo)
## Limitations
The Finegrain Box Segmenter 0.1 has a number of limitations that will be tackled in future versions:
- prompting is not baked in yet,
- it struggles when the object is touching the side of the image,
- it doesn't support yet matting of **STM** (Salient Transparent/Meticulous Objects) or **NS** (non-salient) masks (see [Deep Automatic Natural Image Matting](https://www.ijcai.org/proceedings/2021/0111.pdf) for definition of SO/STM/NS),
- it doesn't fully nail yet hard cases like hard shadows, strong reflections or hand-held configurations:
<table>
<thead>
<tr>
<th></th>
<th><strong>Strong reflection</strong></th>
<th><strong>Hard shadow</strong></th>
<th><strong>Hand-held</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left; vertical-align: middle;">Image</td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Fa1AHBO7fP8GGNMPZX96IX.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Fa1AHBO7fP8GGNMPZX96IX.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FDTv6BHIIBEfYsjPIGBzIP.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FDTv6BHIIBEfYsjPIGBzIP.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FRAacildaNFr4i3idpyOhI.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FRAacildaNFr4i3idpyOhI.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
</tr>
<tr>
<td style="text-align: left; vertical-align: middle;"><code>briaai/RMBG-1.4</code></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FioF3cKfPmmNEcnffM7tTa.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FioF3cKfPmmNEcnffM7tTa.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Fjjmtm8twVl9ZNx3AawcGg.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Fjjmtm8twVl9ZNx3AawcGg.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Ff3MitAdChtoCeOAIYyBfj.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Ff3MitAdChtoCeOAIYyBfj.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
</tr>
<tr>
<td style="text-align: left; vertical-align: middle;"><code>ZhengPeng7/BiRefNet</code></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FuGomwlJoRyeIKDXsffhgs.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FuGomwlJoRyeIKDXsffhgs.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2F8EgCGSPBYDNUwZsOGCX_J.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2F8EgCGSPBYDNUwZsOGCX_J.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2F5elEirUetsWgbFXik6I9A.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2F5elEirUetsWgbFXik6I9A.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
</tr>
<tr>
<td style="text-align: left; vertical-align: middle;"><code>finegrain/finegrain-box-segmenter</code></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FnKTnduZsnO9UKO7gY1BHe.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FnKTnduZsnO9UKO7gY1BHe.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FiTN_OtOnWjcpNaod4MMzz.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2FiTN_OtOnWjcpNaod4MMzz.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
<td><a rel="nofollow" href="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Fb2RHTJrOQxdUf90GX2JRd.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"><img alt="image/png" src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F632334c533a1e1cf9deebf37%2Fb2RHTJrOQxdUf90GX2JRd.png%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></a></td>
</tr>
</tbody>
</table>
## Bias and Fairness
Given our focus on e-commerce, we haven't yet conducted a thorough bias and fairness review. It will be tackled in future releases.
## Usage
### With refiners (https://github.com/finegrain-ai/refiners)
```python
from PIL import Image
from refiners.solutions import BoxSegmenter
input_image = Image.open("input.png")
# Downloads the weights from finegrain/finegrain-box-segmenter
segmenter = BoxSegmenter()
# box_prompt is (x_min, y_min, x_max, y_max)
mask = segmenter(input_image, box_prompt=(24, 133, 588, 531))
# Or without box_prompt as a background remover
# mask = segmenter(input_image.convert("RGB"))
mask.save("output.png")
```
### With Comfy UI
Coming soon
|