spell_corrector_bert2bert_cased_1010_v3
This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0001
- Bleu: 72.1549
- Gen Len: 15.509
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
No log | 1.0 | 488 | 0.0488 | 70.4997 | 15.5281 |
0.1586 | 2.0 | 976 | 0.0438 | 70.9811 | 15.4925 |
0.0868 | 3.0 | 1464 | 0.0342 | 71.2514 | 15.5172 |
0.0736 | 4.0 | 1952 | 0.0283 | 71.4007 | 15.4953 |
0.0532 | 5.0 | 2440 | 0.0241 | 71.5523 | 15.503 |
0.043 | 6.0 | 2928 | 0.0172 | 71.7499 | 15.5107 |
0.0351 | 7.0 | 3416 | 0.0144 | 71.7684 | 15.4988 |
0.029 | 8.0 | 3904 | 0.0117 | 71.8266 | 15.5092 |
0.0219 | 9.0 | 4392 | 0.0091 | 71.9555 | 15.508 |
0.0187 | 10.0 | 4880 | 0.0080 | 71.9537 | 15.5095 |
0.0147 | 11.0 | 5368 | 0.0057 | 72.0262 | 15.5088 |
0.0117 | 12.0 | 5856 | 0.0036 | 72.08 | 15.5087 |
0.0107 | 13.0 | 6344 | 0.0029 | 72.1015 | 15.5084 |
0.0073 | 14.0 | 6832 | 0.0023 | 72.1016 | 15.5078 |
0.0052 | 15.0 | 7320 | 0.0017 | 72.1217 | 15.5099 |
0.0049 | 16.0 | 7808 | 0.0011 | 72.1201 | 15.5083 |
0.0029 | 17.0 | 8296 | 0.0002 | 72.1486 | 15.509 |
0.0018 | 18.0 | 8784 | 0.0003 | 72.1529 | 15.509 |
0.0016 | 19.0 | 9272 | 0.0001 | 72.1549 | 15.509 |
0.0012 | 20.0 | 9760 | 0.0001 | 72.1549 | 15.509 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.14.1
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.