BERTić-Tweet-Base

BERTić-Tweet-Base is an additionally pretrained version of the BERTić, an ELECTRA-based language model, tailored specifically for the social media domain. The model has been pretrained using 37,200 COVID-19 vaccination-related tweets in the Serbian language (approximately 1.3 million tokens), leveraging the unique linguistic features and informal writing styles prevalent on social media platforms.

Its fine-tuned version for the five-class sentiment analysis task is available as BERTić-Tweet.

This model is based on the original BERTić model, which is licensed under the Apache 2.0 license.