--- language: ml tags: - roberta - fine-tuned - transformers - bert - masked-language-model license: apache-2.0 model_type: roberta --- # Fine-tuned RoBERTa on Malay Language This model is a fine-tuned version of the `mesolitica/roberta-base-bahasa-cased` model, specifically trained on a custom Malay dataset. The model is fine-tuned for **Masked Language Modeling (MLM)** on normalized Malay sentences. ## Model Description This model is based on the **RoBERTa** architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks. ### Training Details - **Pre-trained Model**: `mesolitica/roberta-base-bahasa-cased` - **Task**: Masked Language Modeling (MLM) - **Training Dataset**: Custom dataset of Malay sentences - **Training Duration**: 3 epochs - **Batch Size**: 16 per device - **Learning Rate**: 1e-6 - **Optimizer**: AdamW - **Evaluation**: Evaluated every 200 steps ## Training and Validation Loss The following table shows the training and validation loss at each evaluation step during the fine-tuning process: | Step | Training Loss | Validation Loss | |-------|---------------|-----------------| | 200 | 0.069000 | 0.069317 | | 400 | 0.070900 | 0.068213 | | 600 | 0.071900 | 0.067799 | | 800 | 0.070100 | 0.067430 | | 1000 | 0.068300 | 0.066448 | | 1200 | 0.069700 | 0.066594 | | 1400 | 0.069000 | 0.066185 | | 1600 | 0.067100 | 0.066022 | | 1800 | 0.063800 | 0.065695 | | 2000 | 0.037900 | 0.066657 | | 2200 | 0.041200 | 0.066739 | | 2400 | 0.042000 | 0.066777 | | 2600 | 0.040200 | 0.066858 | | 2800 | 0.044700 | 0.066712 | | 3000 | 0.041000 | 0.066415 | | 3200 | 0.041800 | 0.066634 | | 3400 | 0.041200 | 0.066341 | | 3600 | 0.039200 | 0.066837 | | 3800 | 0.023700 | 0.067717 | | 4000 | 0.024100 | 0.068017 | | 4200 | 0.024600 | 0.068155 | | 4400 | 0.024500 | 0.068275 | | 4600 | 0.024500 | 0.068106 | | 4800 | 0.026100 | 0.067965 | | 5000 | 0.024500 | 0.068108 | | 5200 | 0.025100 | 0.068027 | ### Observations: - The training loss consistently decreased over time, with notable reductions in the earlier steps. - The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps. - The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps. ## Intended Use This model is intended for tasks such as: - **Masked Language Modeling (MLM)**: Fill in the blanks for masked tokens in a Malay sentence. - **Text Generation**: Generate plausible text given a context. - **Text Understanding**: Extract contextual meaning from Malay sentences.