SBB
/

Image-to-Image
TF-Keras
pixelwise-segmentation
Jrglmn commited on
Commit
2cb3dc1
·
1 Parent(s): 75c6854

Complete Update of the Model Card

Browse files

Model card according to HF template, not fully according to Mitchell et al, co-authored by Vahid and Joerg.
Known issues: no information on a) Preprocessing, b) Training Results (see evaluation section), c) Factors, d) additional SBB datasets have not yet been published.

Files changed (1) hide show
  1. README.md +261 -20
README.md CHANGED
@@ -1,30 +1,271 @@
1
  ---
2
- tags:
 
3
  - image-to-image
 
 
 
 
4
  license: apache-2.0
5
  ---
6
- # About `sbb_binarization`
7
 
8
- This is a CNN model for document image binarization. It can be
9
- used to convert all pixels in a color or grayscale document image
10
- to only black or white pixels. The main aim is to improve the
11
- contrast between foreground (text) and background (paper) for
12
- purposes of OCR. The model is based on a `ResNet50-Unet` model.
13
 
14
- For further details, have a look at [sbb_binarization](https://github.com/qurator-spk/sbb_binarization) on GitHub.
15
 
16
- # Results
17
- In the *DocEng’2021 Time-Quality Binarization Competition*
18
- ([paper](https://dib.cin.ufpe.br/docs/DocEng21_bin_competition_report.pdf)),
19
- the model ranked 12 times under the top 8 of 63 methods, winning 2 tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- In the *ICDAR 2021 Competition on Time-Quality Document Image
22
- Binarization* ([paper](https://dib.cin.ufpe.br/docs/papers/ICDAR2021-TQDIB_final_published.pdf)),
23
- the model ranked 2 times under the top 20 of 61 methods, winning 1 task.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
- # Weights
26
- We provide a `saved model` for Tensorflow2.
 
 
27
 
28
- | Model | Downloads
29
- | -------------| ------------------------
30
- | `2021_03_09` | [`saved_model`](https://huggingface.co/SBB/sbb_binarization/tree/main/saved_model)
 
1
  ---
2
+ tags:
3
+ - keras
4
  - image-to-image
5
+ - pixelwise-segmentation
6
+ datasets:
7
+ - DIBCO
8
+ - H-DIBCO
9
  license: apache-2.0
10
  ---
 
11
 
 
 
 
 
 
12
 
 
13
 
14
+
15
+
16
+
17
+ # Model Card for sbb_binarization
18
+
19
+ <!-- Provide a quick summary of what the model is/does. [Optional] -->
20
+ This is a pixelwise segmentation model for document image binarization. The model is a CNN encoder-decoder model (Resnet50-Unet). It can be used to convert all pixels in a color or grayscale document image to only black or white pixels. The main aim is to improve the contrast between foreground (text) and background (paper) for purposes of OCR.
21
+
22
+
23
+
24
+
25
+ # Table of Contents
26
+
27
+ - [Model Card for sbb_binarization](#model-card-for-sbb_binarization)
28
+ - [Table of Contents](#table-of-contents)
29
+ - [Model Details](#model-details)
30
+ - [Model Description](#model-description)
31
+ - [Uses](#uses)
32
+ - [Direct Use](#direct-use)
33
+ - [Downstream Use [Optional]](#downstream-use-optional)
34
+ - [Out-of-Scope Use](#out-of-scope-use)
35
+ - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
36
+ - [Recommendations](#recommendations)
37
+ - [Training Details](#training-details)
38
+ - [Training Data](#training-data)
39
+ - [Training Procedure](#training-procedure)
40
+ - [Preprocessing](#preprocessing)
41
+ - [Speeds, Sizes, Times](#speeds-sizes-times)
42
+ - [Evaluation](#evaluation)
43
+ - [Testing Data, Factors & Metrics](#testing-data-factors--metrics)
44
+ - [Testing Data](#testing-data)
45
+ - [Factors](#factors)
46
+ - [Metrics](#metrics)
47
+ - [Results](#results)
48
+ - [Model Examination](#model-examination)
49
+ - [Environmental Impact](#environmental-impact)
50
+ - [Technical Specifications [optional]](#technical-specifications-optional)
51
+ - [Model Architecture and Objective](#model-architecture-and-objective)
52
+ - [Compute Infrastructure](#compute-infrastructure)
53
+ - [Hardware](#hardware)
54
+ - [Software](#software)
55
+ - [Citation](#citation)
56
+ - [Glossary [optional]](#glossary-optional)
57
+ - [More Information [optional]](#more-information-optional)
58
+ - [Model Card Authors [optional]](#model-card-authors-optional)
59
+ - [Model Card Contact](#model-card-contact)
60
+ - [How to Get Started with the Model](#how-to-get-started-with-the-model)
61
+
62
+
63
+ # Model Details
64
+
65
+ ## Model Description
66
+
67
+ <!-- Provide a longer summary of what this model is/does. -->
68
+ Document image binarization is one of the main pre-processing steps for text recognition in document image analysis. Noise, faint characters, bad scanning conditions, uneven light exposure or paper aging can cause artifacts that negatively impact text recognition algorithms. The task of binarization is to segment the foreground (text) from these degradations in order to improve optical character recognition (OCR) results. Convolutional neural networks (CNNs) are one popular method for binarization, and the sbb_binarization model is one of the. We have applied a CNN encoder-decoder model architecture.
69
+
70
+ - **Developed by:** [Vahid Rezanezhad](https://huggingface.co/vahid-nejad)
71
+ - **Shared by [Optional]:** [Staatsbibliothek zu Berlin / Berlin State Library] (https://huggingface.co/SBB)
72
+ - **Model type:** Neural Network
73
+ - **Language(s) (NLP):** Irrelevant; works on all languages
74
+ - **License:** apache-2.0
75
+ - **Parent Model:** [ResNet-50, see the paper by Zhang et al](https://arxiv.org/abs/1512.03385)
76
+ - **Resources for more information:** More information needed
77
+ - [GitHub Repo](https://github.com/qurator-spk/sbb_binarization)
78
+ - Associated Paper 1 [Time-Quality Binarization Competition] (https://dib.cin.ufpe.br/docs/DocEng21_bin_competition_report.pdf)
79
+ - Associated Paper 2 [Time-Quality Document Image Binarization] (https://dib.cin.ufpe.br/docs/papers/ICDAR2021-TQDIB_final_published.pdf)
80
+
81
+ # Uses
82
+
83
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
84
+
85
+ Document image binarization is the main use case of this model. The architecture of this model alongside with training techniques like model weights ensembling can reach or outperform state-of-the-art results on standard Document Binarization Competition (DIBCO) datasets in the both machine-printed and handwritten documents.
86
+
87
+
88
+
89
+ ## Direct Use
90
+
91
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
92
+ <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
93
+
94
+ The intended use is limited to the binarization of images of historical documents, understood as one of the main pre-processing steps necessary for text recognition.
95
+
96
+
97
+ ## Downstream Use [Optional]
98
+
99
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
100
+ <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
101
 
102
+ A possible downstream use of this model might lie with the binarization of illustrative elements contained in document images such as digitized newspapers, magazines or books. In such cases, binarization might support analysis of creator attribution, artistic style (e.g., in line drawings), or analysis of image similarity. Furthermore, the model can be used / be trained for any other image enhancement use cases too.
103
+
104
+
105
+ ## Out-of-Scope Use
106
+
107
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
108
+ <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
109
+
110
+ This model does NOT perform any optical character recognition (OCR).
111
+
112
+
113
+ # Bias, Risks, and Limitations
114
+
115
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
116
+
117
+ The aim of the development of this model was to improve document image binarization as a necessary pre-processing step. Since the content of the document images is not touched, ethical challenges cannot be identified. The endeavour of developing the model was not undertaken for profit; though a product based on this model might be developed in the future, it will be openly accessible without any commercial interest.
118
+ This algorithm performs a pixelwise segmentation which is done in patches. Therefore, one limitation of this model is that it is unable to capture and see long range dependencies.
119
+
120
+
121
+ ## Recommendations
122
+
123
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
124
+
125
+ The application of machine learning models to convert a document image into a binary output is a process which can still be improved. New model structures like Transformers or Hybrid CNN-Transformers may be applied. The transformers would support the model in capturing long range dependencies in image patches. Alongside with a CNN which increases the input features, this could improve image enhancement performance. In addition, we have used many pseudo-labeled images to train our model, so any improvement or ground truth extension would probably lead to better results.
126
+
127
+
128
+ # Training Details
129
+
130
+ ## Training Data
131
+
132
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
133
+ The dataset used for training is a combination of training sets from previous [DIBCO](https://dib.cin.ufpe.br/#!/datasets) binarization competitions alongside with the [Palm Leaf dataset](https://ieeexplore.ieee.org/abstract/document/7814130) and the Persian Heritage Image Binarization Competetion [PHIBC](https://arxiv.org/abs/1306.6263) dataset, with additional pseudo-labeled images from the Berlin State Library (SBB; datasets to be published). Furthermore, a dataset for very dark or very bright images has been produced for training.
134
+
135
+
136
+ ## Training Procedure
137
+
138
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
139
+
140
+ We have used a batch size of 8 with learning rate of 1e − 4 for 20 epochs. A soft dice is applied as loss function. In the training we have taken advantage of dataset augmentation. The augmentation includes flip, scaling and blurring. The best model weights are chosen based on some problematic documents from the SBB dataset. The final model results out of the ensemble of best weights.
141
+
142
+
143
+ ### Preprocessing
144
+ In order to use this model for binarization no preprocessing is needed for input image.
145
+ ### Speeds, Sizes, Times
146
+
147
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
148
+
149
+ More information needed
150
+
151
+ ### Training hyperparameters
152
+
153
+ In the training process the hyperparameters were patch size, learning rate, number of epochs and depth of encoder part.
154
+
155
+ ### Training results
156
+
157
+ See the two papers listed below in the evaluation section.
158
+
159
+
160
+
161
+ # Evaluation
162
+ In the DocEng’2021 [Time-Quality Binarization Competition] (https://dib.cin.ufpe.br/docs/DocEng21_bin_competition_report.pdf), the model ranked twelve times under the top 8 of 63 methods, winning 2 tasks.
163
+
164
+ In the ICDAR 2021 Competition on [Time-Quality Document Image Binarization] (https://dib.cin.ufpe.br/docs/papers/ICDAR2021-TQDIB_final_published.pdf), the model ranked two times under the top 20 of 61 methods, winning 1 task.
165
+
166
+
167
+ <!-- This section describes the evaluation protocols and provides the results. -->
168
+
169
+ ## Testing Data, Factors & Metrics
170
+
171
+ ### Testing Data
172
+
173
+ <!-- This should link to a Data Card if possible. -->
174
+
175
+ The testing data are the ones used in the [Time-Quality Binarization Competition](https://dib.cin.ufpe.br/docs/DocEng21_bin_competition_report.pdf) and listed in the paper on [Time-Quality Document Image Binarization](https://dib.cin.ufpe.br/docs/papers/ICDAR2021-TQDIB_final_published.pdf)
176
+
177
+
178
+ ### Factors
179
+
180
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
181
+
182
+ More information needed
183
+
184
+ ### Metrics
185
+
186
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
187
+
188
+ The model has been evaluated both based on OCR and pixelwise segmentation results. The metrics which have been used in the case of visual evaluation are pixel proportion error and Cohen's Kappa value, and Levenshtein distance error in the case of OCR.
189
+
190
+ ## Results
191
+
192
+ See the two papers listed above in the evaluation section.
193
+
194
+ # Model Examination
195
+
196
+ More information needed
197
+
198
+ # Environmental Impact
199
+
200
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
201
+
202
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
203
+
204
+ - **Hardware Type:** More information needed
205
+ - **Hours used:** More information needed
206
+ - **Cloud Provider:** More information needed
207
+ - **Compute Region:** More information needed
208
+ - **Carbon Emitted:** More information needed
209
+
210
+ # Technical Specifications [optional]
211
+
212
+ ## Model Architecture and Objective
213
+
214
+ The proposed model is a CNN encoder-decoder model. The encoder part consists of a ResNet-50 model. The ResNet-50 includes convolutional neural networks and is responsible for extracting as many features as possible from the input image. After that the input image goes through the CNN part, then the output undergoes upsampling convolutional layers until the same output size as image input is rebuilt.
215
+
216
+ ## Compute Infrastructure
217
+
218
+ More information needed
219
+
220
+ ### Hardware
221
+
222
+ More information needed
223
+
224
+ ### Software
225
+
226
+ More information needed
227
+
228
+ # Citation
229
+
230
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
231
+
232
+ **BibTeX:**
233
+
234
+ More information needed
235
+
236
+ **APA:**
237
+
238
+ More information needed
239
+
240
+ # Glossary [optional]
241
+
242
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
243
+
244
+ More information needed
245
+
246
+ # More Information [optional]
247
+
248
+ More information needed
249
+
250
+ # Model Card Authors [optional]
251
+
252
+ <!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
253
+
254
+ [Vahid Rezanezhad](https://huggingface.co/vahid-nejad), [Clemens Neudecker](https://huggingface.co/cneud), [Konstantin Baierer]([email protected])
255
+
256
+ # Model Card Contact
257
+
258
+ Questions and comments about the model can be directed to Clemens Neudecker at [email protected], questions and comments about the model card can be directed to Jörg Lehmann at [email protected]
259
+
260
+ # How to Get Started with the Model
261
+
262
+ Use the code below to get started with the model.
263
 
264
+ sbb_binarize \
265
+ -m <from_pretrained_keras(&#34;sbb_binarization&#34;)> \
266
+ <input image> \
267
+ <output image>
268
 
269
+ <details>
270
+ How to get started with this model is explained in the Read Me-file of the GitHub repository [over here](https://github.com/qurator-spk/sbb_binarization).
271
+ </details>