This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.

  • Task: Sentiment analysis (binary classification).

  • Languages Supported: Khmer.

  • Intended Use Cases:

    • Analyzing customer reviews.
    • Social media sentiment detection.
  • Limitations: - Performance may degrade on languages or domains not present in the training data. - Does not handle sarcasm or highly ambiguous inputs well.

    The model was evaluated on a test set of 400 samples, achieving the following performance:

  • Test Accuracy: 81%

  • Precision: 81%

  • Recall: 81%

  • F1 Score: 81%

Confusion Matrix:

Predicted\Actual Negative Positive
Negative 165 44
Positive 31 160
The model supports a maximum sequence length of 512 tokens.

How to Use

from huggingface_hub import hf_hub_download
import fasttext
from khmernltk import word_tokenize

model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin"))

def predict(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Join tokens back into a single string
    tokenized_text = ' '.join(tokens)
    # Make predictions
    predictions = model.predict(tokenized_text)
    # Map labels to human-readable format
    label_mapping = {
        '__label__0': 'negative',
        '__label__1': 'positive'
    }
    # Get the predicted label
    predicted_label = predictions[0][0]
    # Map the predicted label
    human_readable_label = label_mapping.get(predicted_label, 'unknown')
    return human_readable_label
predict('αž“αŸαŸ‡αž‚αžΈαž‡αžΆαž›αŸ’αž”αŸ‡αž’αžœαž·αž‡αŸ’αž‡αž˜αžΆαž“αžŸαž˜αŸ’αžšαžΆαž”αŸ‹αž”αŸ’αžšαž‡αžΆαž‡αž“αžαŸ’αž˜αŸ‚αžš')
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tykea/khmer-fasttext-sentiment-analysis

Finetuned
(1)
this model