File size: 3,216 Bytes
4ac08c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
language: ml
tags:
- roberta
- fine-tuned
- transformers
- bert
- masked-language-model
license: apache-2.0
model_type: roberta
---
# Fine-tuned RoBERTa on Malay Language
This model is a fine-tuned version of the `mesolitica/roberta-base-bahasa-cased` model, specifically trained on a custom Malay dataset. The model is fine-tuned for **Masked Language Modeling (MLM)** on normalized Malay sentences.
## Model Description
This model is based on the **RoBERTa** architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.
### Training Details
- **Pre-trained Model**: `mesolitica/roberta-base-bahasa-cased`
- **Task**: Masked Language Modeling (MLM)
- **Training Dataset**: Custom dataset of Malay sentences
- **Training Duration**: 3 epochs
- **Batch Size**: 16 per device
- **Learning Rate**: 1e-6
- **Optimizer**: AdamW
- **Evaluation**: Evaluated every 200 steps
## Training and Validation Loss
The following table shows the training and validation loss at each evaluation step during the fine-tuning process:
| Step | Training Loss | Validation Loss |
|-------|---------------|-----------------|
| 200 | 0.069000 | 0.069317 |
| 400 | 0.070900 | 0.068213 |
| 600 | 0.071900 | 0.067799 |
| 800 | 0.070100 | 0.067430 |
| 1000 | 0.068300 | 0.066448 |
| 1200 | 0.069700 | 0.066594 |
| 1400 | 0.069000 | 0.066185 |
| 1600 | 0.067100 | 0.066022 |
| 1800 | 0.063800 | 0.065695 |
| 2000 | 0.037900 | 0.066657 |
| 2200 | 0.041200 | 0.066739 |
| 2400 | 0.042000 | 0.066777 |
| 2600 | 0.040200 | 0.066858 |
| 2800 | 0.044700 | 0.066712 |
| 3000 | 0.041000 | 0.066415 |
| 3200 | 0.041800 | 0.066634 |
| 3400 | 0.041200 | 0.066341 |
| 3600 | 0.039200 | 0.066837 |
| 3800 | 0.023700 | 0.067717 |
| 4000 | 0.024100 | 0.068017 |
| 4200 | 0.024600 | 0.068155 |
| 4400 | 0.024500 | 0.068275 |
| 4600 | 0.024500 | 0.068106 |
| 4800 | 0.026100 | 0.067965 |
| 5000 | 0.024500 | 0.068108 |
| 5200 | 0.025100 | 0.068027 |
### Observations:
- The training loss consistently decreased over time, with notable reductions in the earlier steps.
- The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
- The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.
## Intended Use
This model is intended for tasks such as:
- **Masked Language Modeling (MLM)**: Fill in the blanks for masked tokens in a Malay sentence.
- **Text Generation**: Generate plausible text given a context.
- **Text Understanding**: Extract contextual meaning from Malay sentences. |