--- license: openrail tags: - document-image-binarization - image-segmentation - generated_from_trainer model-index: - name: binarization-segformer-b3 results: [] pipeline_tag: image-segmentation --- # binarization-segformer-b3 This model is a fine-tuned version of [nvidia/segformer-b3](https://huggingface.co/nvidia/segformer-b3-finetuned-cityscapes-1024-1024) on the same ensemble of 13 datasets as the [SauvolaNet](https://arxiv.org/pdf/2105.05521.pdf) work publicly available in their GitHub [repository](https://github.com/Leedeng/SauvolaNet#datasets). It achieves the following results on the evaluation set on DIBCO metrics: - loss: 0.1017 - F-measure: 0.9776 - pseudo F-measure: 0.9531 - PSNR: 14.5040 - DRD: 5.3749 with PSNR the peak signal-to-noise ratio and DRD the distance reciprocal distortion. For more information on the above DIBCO metrics, see the 2017 introductory [paper](https://ieeexplore.ieee.org/document/8270159). **Warning:** This model only accepts images with a resolution of 640 due to GPU compute constraints on Colab free tier during training. ## Model description This model is part of on-going research on pure semantic segmentation models as a formulation of document image binarization (DIBCO). This is in contrast to the late trend of adapting classic binarization algorithms with neural networks, such as [DeepOtsu](https://arxiv.org/abs/1901.06081) or the aforementioned SauvolaNet work as extensions of the classical Otsu's method and Sauvola thresholding algorithm, respectively. ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 10 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 50 - num_epochs: 50 ### Training results | training loss | epoch | step | validation loss | F-measure | pseudo F-measure | PSNR | DRD | |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:-------:|:--------:| | 0.6667 | 1.03 | 10 | 0.6683 | 0.7127 | 0.6831 | 4.8248 | 107.2894 | | 0.6371 | 2.05 | 20 | 0.6390 | 0.8173 | 0.7360 | 6.1079 | 69.7770 | | 0.587 | 3.08 | 30 | 0.5652 | 0.8934 | 0.8187 | 7.9143 | 40.5464 | | 0.5288 | 4.1 | 40 | 0.4926 | 0.9240 | 0.8554 | 9.2247 | 27.4220 | | 0.4601 | 5.13 | 50 | 0.4244 | 0.9490 | 0.8944 | 10.8830 | 16.8051 | | 0.3864 | 6.15 | 60 | 0.3446 | 0.9638 | 0.9218 | 12.3460 | 10.6997 | | 0.3331 | 7.18 | 70 | 0.3055 | 0.9693 | 0.9317 | 13.0531 | 8.5298 | | 0.2821 | 8.21 | 80 | 0.2512 | 0.9736 | 0.9427 | 13.6929 | 6.8343 | | 0.2392 | 9.23 | 90 | 0.2112 | 0.9744 | 0.9462 | 13.8825 | 6.4094 | | 0.2126 | 10.26 | 100 | 0.1948 | 0.9743 | 0.9433 | 13.8424 | 6.5637 | | 0.1889 | 11.28 | 110 | 0.1710 | 0.9749 | 0.9499 | 13.9784 | 6.1757 | | 0.1662 | 12.31 | 120 | 0.1604 | 0.9753 | 0.9495 | 14.0450 | 6.0929 | | 0.1506 | 13.33 | 130 | 0.1451 | 0.9750 | 0.9550 | 14.0028 | 6.1031 | | 0.1359 | 14.36 | 140 | 0.1362 | 0.9759 | 0.9501 | 14.1383 | 5.9699 | | 0.1321 | 15.38 | 150 | 0.1351 | 0.9761 | 0.9485 | 14.1907 | 5.9045 | | 0.1283 | 16.41 | 160 | 0.1266 | 0.9758 | 0.9541 | 14.1515 | 5.8287 | | 0.1198 | 17.44 | 170 | 0.1232 | 0.9763 | 0.9535 | 14.2411 | 5.7300 | | 0.1151 | 18.46 | 180 | 0.1232 | 0.9765 | 0.9482 | 14.2788 | 5.8266 | | 0.1146 | 19.49 | 190 | 0.1183 | 0.9764 | 0.9530 | 14.2363 | 5.7922 | | 0.1027 | 20.51 | 200 | 0.1162 | 0.9765 | 0.9535 | 14.2867 | 5.6246 | | 0.1051 | 21.54 | 210 | 0.1146 | 0.9766 | 0.9551 | 14.2963 | 5.6159 | | 0.1095 | 22.56 | 220 | 0.1159 | 0.9767 | 0.9497 | 14.3153 | 5.8966 | | 0.1076 | 23.59 | 230 | 0.1106 | 0.9768 | 0.9533 | 14.3267 | 5.6436 | | 0.1006 | 24.62 | 240 | 0.1113 | 0.9769 | 0.9483 | 14.3683 | 5.6679 | | 0.1077 | 25.64 | 250 | 0.1086 | 0.9770 | 0.9544 | 14.3843 | 5.4949 | | 0.0966 | 26.67 | 260 | 0.1077 | 0.9770 | 0.9553 | 14.3660 | 5.5337 | | 0.0958 | 27.69 | 270 | 0.1071 | 0.9773 | 0.9529 | 14.4405 | 5.4582 | | 0.0984 | 28.72 | 280 | 0.1055 | 0.9772 | 0.9536 | 14.4405 | 5.4365 | | 0.0936 | 29.74 | 290 | 0.1056 | 0.9774 | 0.9528 | 14.4634 | 5.4066 | | 0.0958 | 30.77 | 300 | 0.1049 | 0.9772 | 0.9544 | 14.4138 | 5.4854 | | 0.0896 | 31.79 | 310 | 0.1043 | 0.9774 | 0.9533 | 14.4593 | 5.4351 | | 0.0973 | 32.82 | 320 | 0.1035 | 0.9774 | 0.9528 | 14.4633 | 5.4430 | | 0.0943 | 33.85 | 330 | 0.1033 | 0.9775 | 0.9527 | 14.4809 | 5.4193 | | 0.0956 | 34.87 | 340 | 0.1026 | 0.9774 | 0.9543 | 14.4576 | 5.4070 | | 0.0936 | 35.9 | 350 | 0.1031 | 0.9775 | 0.9531 | 14.4827 | 5.4137 | | 0.0937 | 36.92 | 360 | 0.1028 | 0.9773 | 0.9551 | 14.4420 | 5.4084 | | 0.0952 | 37.95 | 370 | 0.1023 | 0.9775 | 0.9541 | 14.4809 | 5.3769 | | 0.0952 | 38.97 | 380 | 0.1023 | 0.9776 | 0.9525 | 14.5086 | 5.3839 | | 0.0948 | 40.0 | 390 | 0.1020 | 0.9774 | 0.9546 | 14.4667 | 5.3800 | | 0.0931 | 41.03 | 400 | 0.1020 | 0.9776 | 0.9534 | 14.5043 | 5.3728 | | 0.0906 | 42.05 | 410 | 0.1023 | 0.9774 | 0.9544 | 14.4771 | 5.3773 | | 0.0974 | 43.08 | 420 | 0.1019 | 0.9776 | 0.9536 | 14.5024 | 5.3718 | | 0.0908 | 44.1 | 430 | 0.1025 | 0.9776 | 0.9536 | 14.4995 | 5.3730 | | 0.0935 | 45.13 | 440 | 0.1024 | 0.9775 | 0.9537 | 14.4978 | 5.3715 | | 0.0927 | 46.15 | 450 | 0.1017 | 0.9776 | 0.9531 | 14.5040 | 5.3749 | ### Framework versions - Transformers 4.27.4 - Pytorch 2.0.0+cu118 - Datasets 2.11.0 - Tokenizers 0.13.3