IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset

Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model on the IndoNLU SmSA dataset following the procedues stated in the paper IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding.

How to use

from transformers import pipeline
classifier = pipeline("text-classification", 
                      model='tyqiangz/indobert-lite-large-p2-smsa', 
                      return_all_scores=True)
text = "Penyakit koronavirus 2019"
prediction = classifier(text)
prediction

"""
Output:
[[{'label': 'positive', 'score': 0.0006000096909701824},
  {'label': 'neutral', 'score': 0.01223431620746851},
  {'label': 'negative', 'score': 0.987165629863739}]]
"""

Finetuning hyperparameters:

  • learning rate: 2e-5
  • batch size: 16
  • no. of epochs: 5
  • max sequence length: 512
  • random seed: 42

Classes:

  • 0: positive
  • 1: neutral
  • 2: negative

Performance metrics on SmSA validation dataset

  • Validation accuracy: 0.94
  • Validation F1: 0.91
  • Validation Recall: 0.91
  • Validation Precision: 0.93
Downloads last month
15
Safetensors
Model size
17.7M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.