Mobile App Classification

Model description

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. The model can handle input sequence of length up to 4,096 tokens.

The google/bigbird-roberta-base model is fine-tuned to classify an mobile app description into one of 6 play store categories. Trained on 9000 samples of English App Descriptions and associated categories of apps available in Google Play.

Fine-tuning

The model was fine-tuned for 5 epochs with a batch size of 16, a learning rate of 2e-05, and a maximum sequence length of 1024. Since this was a classification task, the model was trained with a cross-entropy loss function. The best evaluation f1 score achieved by the model was 0.8964259037209702, found after 4 epochs. The accuracy of the model on the test set was 0.8966.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("nsi319/bigbird-roberta-base-finetuned-app")  
model = AutoModelForSequenceClassification.from_pretrained("nsi319/bigbird-roberta-base-finetuned-app")

classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

classifier("From scores to signings, the ESPN App is here to keep you updated. Never miss another sporting moment with up-to-the-minute scores, latest news & a range of video content. Sign in and personalise the app to receive alerts for your teams and leagues. Wherever, whenever; the ESPN app keeps you connected.")

'''Output'''
[{'label': 'Sports', 'score': 0.9983325600624084}]

Limitations

Training data consists of apps from 6 play store categories namely Education, Entertainment, Productivity, Sports, News & Magazines and Photography.

Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.