Model Card for Model ID

llama3.2-3B 모델을 prompt를 고정하고 lora 방식으로 학습한 모델입니다.
기쁨, 당황, 분노, 불안, 상처, 슬픔 총 6가지 감정을 학습하였습니다.
데이터는 AIHUB의 감성 대화 말뭉치를 사용했습니다.
나이와 성별도 학습시 사용했습니다.

Uses

import re
import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

model = None
tokenizer = None
device = None

PROMPT="""<|prompt|>You are an AI assistant tasked with analyzing the emotional content of a diary entry. Your goal is to determine the most closely matching emotion from a predefined list.

Here is the diary entry you need to analyze:

<diary_entry>
age: {age} | gender: {gender} | diary: {sentence}
</diary_entry>

Please carefully read and analyze the content of this diary entry. Consider the overall tone, the events described, and the language used by the writer.

Based on your analysis, choose the emotion that best matches the overall sentiment of the diary entry from the following list:

['분노', '불안', '상처', '슬픔', '당황', '기쁨']

Translate these emotions to English for your understanding:
['분노(anger)', '불안(anxiety)', '상처(hurt)', '슬픔(sadness)', '당황(embarrassment)', '기쁨(happiness)']

After you've made your decision, respond with only the chosen emotion in Korean. Do not provide any explanation or additional text.

Your response should be formatted as follows:
<emotion>[chosen emotion in korean]</emotion>

Once you've provided the emotion, end the conversation. Do not engage in any further dialogue or provide any additional information.
<|assistant|>"""

def load_model():
    global model, tokenizer, device

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    path = './llama-3.2-3B-sentiment-kr-LoRA'
    
    tokenizer = AutoTokenizer.from_pretrained(path)
    model = AutoPeftModelForCausalLM.from_pretrained(
        path,
        attn_implementation="flash_attention_2",
        torch_dtype=torch.float16,
        device_map=device,
    )
    model.eval()

def generate(text, age, gender):
    global model, tokenizer, device
    text = PROMPT.format(age=age, gender=gender, sentence=text)
    inputs = tokenizer(text, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=11, pad_token_id=tokenizer.pad_token_id)
        decoded_output = tokenizer.decode(outputs[0])

        try:
            pred = decoded_output.split("<|assistant|>")[1]
            pred = re.search(r'<emotion>(.*?)</emotion>', pred).group(1)
        except:
            pred = 'error'
            
    return pred

print(generate("오늘 친구랑 싸웠어.", "", ""))

Accuracy

데이터 학습시 일부를 테스트용 데이터로 정확도 측정 결과 약 70%를 달성했습니다.

Framework versions

PEFT 0.13.0

ozingmw
/

llama-3.2-3B-sentiment-kr-LoRA

Model Card for Model ID

Uses

Accuracy

Framework versions

Model tree for ozingmw/llama-3.2-3B-sentiment-kr-LoRA