11italian-sent / README.md
theoracle's picture
Update README.md
db3ce5e verified
metadata
title: Gemma 2B Italian Sentiment Analysis Model
tags:
  - sentiment-analysis
  - italian
  - autotrain
  - gemma-2b
datasets:
  - custom
library_name: transformers
model: theoracle/11italian-sent
license: other

Overview

Introducing the theoracle/11italian-sent model, a highly specialized tool designed for sentiment analysis in Italian. Built on the Gemma 2B architecture and fine-tuned with a diverse set of Italian texts, this model can accurately classify sentences into positive, neutral, or negative sentiments. Whether analyzing customer feedback, social media posts, or news headlines, this model offers deep insights into the emotional tone of Italian texts.

Key Features

  • Tailored for Italian Text: Optimized to understand the nuances and complexities of the Italian language.
  • Comprehensive Sentiment Analysis: Categorizes text into positive, neutral, or negative sentiments with high accuracy.
  • Powered by Gemma 2B and AutoTrain: Utilizes the Gemma 2B architecture for advanced text processing and is trained using Hugging Face's AutoTrain for maximum efficiency.

Usage

Here's how to utilize this model for sentiment analysis on Italian text:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "theoracle/11italian-sent"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prepare your hotel review or any Italian text
prompt = '''
Analyze the sentiment of the hotel review enclosed in square brackets, determine if it is positive, neutral, or negative, and return the answer as the corresponding sentiment label "positive" or "neutral" or "negative" [Inserisci qui la tua recensione dell'hotel]
'''

# Tokenize and generate the response
encoding = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True, max_length=500, add_special_tokens=True)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']

output_ids = model.generate(
    input_ids.to('cuda'),
    attention_mask=attention_mask.to('cuda'),
    max_new_tokens=300,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)

Application Scenarios

This model is invaluable for:

  • Businesses analyzing customer reviews in Italian.
  • Social media monitoring for sentiment analysis.
  • Researchers studying public opinion on various topics through Italian text.

Training and Technology

theoracle/11italian-sent is trained with AutoTrain, ensuring optimal training efficiency, and is built on the Gemma 2B architecture, known for its high performance in text generation and understanding tasks. The combination offers unparalleled accuracy and speed in processing Italian texts for sentiment analysis.

License

This model is available under an "other" license, suitable for both commercial and non-commercial use, though users are encouraged to review the license details for full compliance with their intended applications.