Text Generation
Transformers
GGUF
Italian
English
trl
phi3
spectrum
Inference Endpoints
conversational

QuantFactory/Phi-3.5-mini-ITA-GGUF

This is quantized version of anakin87/Phi-3.5-mini-ITA created using llama.cpp

Original Model Card

Phi-3.5-mini-ITA

Fine-tuned version of Microsoft/Phi-3.5-mini-instruct optimized for better performance in Italian.

  • Small yet powerful model with 3.82 billion parameters
  • Supports 128k context length

๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces

๐Ÿ† Evaluation

Model Parameters Average MMLU_IT ARC_IT HELLASWAG_IT
anakin87/Phi-3.5-mini-ITA 3.82 B 57.67 59.93 51.5 61.57
meta-llama/Meta-Llama-3.1-8B-Instruct 8.03 B 56.97 58.43 48.42 64.07
microsoft/Phi-3.5-mini-instruct 3.82 B 56.82 60.03 49.19 61.25

For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.

๐ŸŽฎ Model in action

Demo

๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces

Text generation with Transformers

The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

With transformers==4.44.2, trust_remote_code=True is needed to incorporate a minor bug fix in Phi3ForCausalLM. Read this discussion for more details.

โšก The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the attn_implementation parameter in the code snippet below.

# pip install transformers accelerate
import torch
from transformers import pipeline

model_id="anakin87/Phi-3.5-mini-ITA"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # attn_implementation="flash_attention_2",  # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])

Example output:

Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.

Imperfetto:
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- รˆ spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."

Passato Prossimo:
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
- รˆ spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."

In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.

Build AI applications

You can use the model to create a variety of AI applications.

I recommend using the ๐Ÿ—๏ธ Haystack LLM framework for orchestration. (spoiler: I work on it and it is open-source ๐Ÿ˜„)

This model is compatible with HuggingFaceLocalGenerator and HuggingFaceLocalChatGenerator components. You can also deploy the model with a TGI container and then use it with HuggingFaceAPIGenerator and the related Chat Generator.

Some examples you can keep inspiration from:

๐Ÿ”ง Training details

This model was fine-tuned using HF TRL. It underwent 2 epochs of instruction fine-tuning on the FineTome-100k and Capybara-Claude-15k-ita datasets. ๐Ÿ™ Thanks to the authors for providing these datasets.

I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.

Training required about 14 hours on a single A40 GPU.

I may release a guide/tutorial soon. Stay tuned! ๐Ÿ“ป

Downloads last month
299
GGUF
Model size
3.82B params
Architecture
phi3

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for QuantFactory/Phi-3.5-mini-ITA-GGUF

Quantized
(106)
this model

Datasets used to train QuantFactory/Phi-3.5-mini-ITA-GGUF