Pre-trained BERT on Twitter US Political Election 2020

Pre-trained weights for PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter, LREC 2022.

Please see the official repository for more detail.

We use the initialized weights from BERTweet or vinai/bertweet-base.

Training Data

This model is pre-trained on over 83 million English tweets about the 2020 US Presidential Election.

Training Objective

This model is initialized with BERTweet and trained with an MLM objective.

Usage

This pre-trained language model can be fine-tunned to any downstream task (e.g. classification).

from transformers import AutoModel, AutoTokenizer, pipeline
import torch

# choose GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"

# load model
tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
model = AutoModel.from_pretrained(pretrained_LM_path)

# fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer)

outputs = fill_mask(example)
print(outputs)

# see embeddings
inputs = tokenizer(example, return_tensors="pt")
outputs = model(**inputs)
print(outputs)

# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)

Reference

Citation

@inproceedings{kawintiranon2022polibertweet,
  title     = {PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter},
  author    = {Kawintiranon, Kornraphop and Singh, Lisa},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference},
  year      = {2022},
  publisher = {European Language Resources Association}
}
Downloads last month
168
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.