Qwen 2.5 7B Instruct ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹

์ด ์ €์žฅ์†Œ๋Š” Amazon SageMaker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Qwen 2.5 7B Instruct ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์ ํŠธ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ํšจ์œจ์ ์ธ ํŒŒ์ธํŠœ๋‹์„ ์œ„ํ•ด QLoRA(Quantized Low-Rank Adaptation)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

์š”๊ตฌ์‚ฌํ•ญ

  • Python 3.8 ์ด์ƒ
  • CUDA ์ง€์› GPU (์ตœ์†Œ 24GB VRAM ๊ถŒ์žฅ)
  • ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ:
pip install torch transformers accelerate

๊ธฐ๋ณธ ์‚ฌ์šฉ ์˜ˆ์‹œ

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# CUDA ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ
if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("Warning: CUDA not available, using CPU")

# ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
model = AutoModelForCausalLM.from_pretrained(
    "seong67360/Qwen2.5-7B-Instruct_v4",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "seong67360/Qwen2.5-7B-Instruct_v4",
    trust_remote_code=True
)

# ๋Œ€ํ™” ์˜ˆ์‹œ
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is quantum computing?"}
]

# ์‘๋‹ต ์ƒ์„ฑ
response = model.chat(tokenizer, messages)
print(response)

๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ์˜ต์…˜

GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ œํ•œ๋œ ๊ฒฝ์šฐ, 8๋น„ํŠธ ๋˜๋Š” 4๋น„ํŠธ ์–‘์žํ™”๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

# 8๋น„ํŠธ ์–‘์žํ™”
model = AutoModelForCausalLM.from_pretrained(
    "seong67360/Qwen2.5-7B-Instruct_v4",
    device_map="auto",
    trust_remote_code=True,
    load_in_8bit=True
)

# ๋˜๋Š” 4๋น„ํŠธ ์–‘์žํ™”
model = AutoModelForCausalLM.from_pretrained(
    "seong67360/Qwen2.5-7B-Instruct_v4",
    device_map="auto",
    trust_remote_code=True,
    load_in_4bit=True
)

์ƒ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •

response = model.chat(
    tokenizer, 
    messages,
    temperature=0.7,          # ๋†’์„์ˆ˜๋ก ๋” ์ฐฝ์˜์ ์ธ ์‘๋‹ต
    top_p=0.9,               # ์ƒ˜ํ”Œ๋ง์— ์‚ฌ์šฉ๋  ๋ˆ„์  ํ™•๋ฅ ์˜ ์ž„๊ณ„๊ฐ’
    max_new_tokens=512,      # ์ƒ์„ฑํ•  ์ตœ๋Œ€ ํ† ํฐ ์ˆ˜
    repetition_penalty=1.1    # ๋ฐ˜๋ณต ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŽ˜๋„ํ‹ฐ (1.0 ์ด์ƒ)
)

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

.
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ train.py
โ”‚   โ”œโ”€โ”€ tokenization_qwen2.py
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ bootstrap.sh
โ”œโ”€โ”€ sagemaker_train.py
โ””โ”€โ”€ README.md

์‚ฌ์ „ ์š”๊ตฌ์‚ฌํ•ญ

  • Amazon SageMaker ์ ‘๊ทผ ๊ถŒํ•œ
  • Hugging Face ๊ณ„์ • ๋ฐ ์ ‘๊ทผ ํ† ํฐ
  • AWS ์ž๊ฒฉ ์ฆ๋ช… ๊ตฌ์„ฑ
  • Python 3.10+

ํ™˜๊ฒฝ ์„ค์ •

ํ”„๋กœ์ ํŠธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์ฃผ์š” ์˜์กด์„ฑ:

  • PyTorch 2.1.0
  • Transformers (main ๋ธŒ๋žœ์น˜์˜ ์ตœ์‹  ๋ฒ„์ „)
  • Accelerate >= 0.27.0
  • PEFT >= 0.6.0
  • BitsAndBytes >= 0.41.0

๋ชจ๋ธ ๊ตฌ์„ฑ

  • ๊ธฐ๋ณธ ๋ชจ๋ธ: Qwen/Qwen2.5-7B-Instruct
  • ํ•™์Šต ๋ฐฉ๋ฒ•: QLoRA (4๋น„ํŠธ ์–‘์žํ™”)
  • ์ธ์Šคํ„ด์Šค ์œ ํ˜•: ml.p5.48xlarge
  • ๋ถ„์‚ฐ ์ „๋žต: PyTorch DDP

ํ•™์Šต ๊ตฌ์„ฑ

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

{
    'epochs': 3,
    'per_device_train_batch_size': 4,
    'gradient_accumulation_steps': 8,
    'learning_rate': 1e-5,
    'max_steps': 1000,
    'bf16': True,
    'max_length': 2048,
    'gradient_checkpointing': True,
    'optim': 'adamw_torch',
    'lr_scheduler_type': 'cosine',
    'warmup_ratio': 0.1,
    'weight_decay': 0.01,
    'max_grad_norm': 0.3
}

ํ™˜๊ฒฝ ๋ณ€์ˆ˜

ํ•™์Šต ํ™˜๊ฒฝ์€ ๋ถ„์‚ฐ ํ•™์Šต ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ ์ตœ์ ํ™”๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • CUDA ์žฅ์น˜ ๊ตฌ์„ฑ
  • ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ์„ค์ •
  • ๋ถ„์‚ฐ ํ•™์Šต์„ ์œ„ํ•œ EFA(Elastic Fabric Adapter) ๊ตฌ์„ฑ
  • Hugging Face ํ† ํฐ ๋ฐ ์บ์‹œ ์„ค์ •

ํ•™์Šต ํ”„๋กœ์„ธ์Šค

  1. ํ™˜๊ฒฝ ์ค€๋น„:

    • ํ•„์š”ํ•œ ์˜์กด์„ฑ์ด ํฌํ•จ๋œ requirements.txt ์ƒ์„ฑ
    • Transformers ์„ค์น˜๋ฅผ ์œ„ํ•œ bootstrap.sh ์ƒ์„ฑ
    • SageMaker ํ•™์Šต ๊ตฌ์„ฑ ์„ค์ •
  2. ๋ชจ๋ธ ๋กœ๋”ฉ:

    • 4๋น„ํŠธ ์–‘์žํ™”๋กœ ๊ธฐ๋ณธ Qwen 2.5 7B ๋ชจ๋ธ ๋กœ๋“œ
    • ์–‘์žํ™”๋ฅผ ์œ„ํ•œ BitsAndBytes ๊ตฌ์„ฑ
    • k-bit ํ•™์Šต์„ ์œ„ํ•œ ๋ชจ๋ธ ์ค€๋น„
  3. ๋ฐ์ดํ„ฐ์…‹ ์ฒ˜๋ฆฌ:

    • Sujet Finance ๋ฐ์ดํ„ฐ์…‹ ์‚ฌ์šฉ
    • Qwen2 ํ˜•์‹์œผ๋กœ ๋Œ€ํ™” ํฌ๋งทํŒ…
    • ์ตœ๋Œ€ 2048 ํ† ํฐ ๊ธธ์ด๋กœ ํ† ํฌ๋‚˜์ด์ง•
    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ตฌํ˜„
  4. ํ•™์Šต:

    • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ์„ ์œ„ํ•œ gradient checkpointing ๊ตฌํ˜„
    • ์›œ์—…์ด ํฌํ•จ๋œ ์ฝ”์‚ฌ์ธ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„ ์‚ฌ์šฉ
    • 50 ์Šคํ…๋งˆ๋‹ค ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ
    • 10 ์Šคํ…๋งˆ๋‹ค ํ•™์Šต ๋ฉ”ํŠธ๋ฆญ ๋กœ๊น…

๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋ฉ”ํŠธ๋ฆญ

ํ•™์Šต ๊ณผ์ •์—์„œ ๋‹ค์Œ ๋ฉ”ํŠธ๋ฆญ์„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค:

  • ํ•™์Šต ์†์‹ค(Training loss)
  • ํ‰๊ฐ€ ์†์‹ค(Evaluation loss)

์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ

๊ตฌํ˜„์—๋Š” ํฌ๊ด„์ ์ธ ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ ๋ฐ ๋กœ๊น…์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • ํ™˜๊ฒฝ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ
  • ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„ ๊ฒ€์ฆ
  • ํ•™์Šต ํ”„๋กœ์„ธ์Šค ๋ชจ๋‹ˆํ„ฐ๋ง
  • ์ž์„ธํ•œ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€ ๋ฐ ์Šคํƒ ์ถ”์ 

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

  1. AWS ์ž๊ฒฉ ์ฆ๋ช… ๋ฐ SageMaker ์—ญํ•  ๊ตฌ์„ฑ
  2. Hugging Face ํ† ํฐ ์„ค์ •
  3. ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰:
python sagemaker_train.py

์ปค์Šคํ…€ ์ปดํฌ๋„ŒํŠธ

์ปค์Šคํ…€ ํ† ํฌ๋‚˜์ด์ €

ํ”„๋กœ์ ํŠธ๋Š” ๋‹ค์Œ ๊ธฐ๋Šฅ์ด ํฌํ•จ๋œ Qwen2 ํ† ํฌ๋‚˜์ด์ €์˜ ์ปค์Šคํ…€ ๊ตฌํ˜„(tokenization_qwen2.py)์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:

  • ํŠน์ˆ˜ ํ† ํฐ ์ฒ˜๋ฆฌ
  • ์œ ๋‹ˆ์ฝ”๋“œ ์ •๊ทœํ™”
  • ์–ดํœ˜ ๊ด€๋ฆฌ
  • ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ์ž…๋ ฅ ์ค€๋น„

์ฃผ์˜์‚ฌํ•ญ

  • ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ml.p5.48xlarge ์ธ์Šคํ„ด์Šค ํƒ€์ž…์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค
  • PyTorch Distributed Data Parallel์„ ์‚ฌ์šฉํ•œ ํ•™์Šต
  • ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ gradient checkpointing ๊ตฌํ˜„
  • ํ•™์Šต ์‹คํŒจ์— ๋Œ€ํ•œ ์ž๋™ ์žฌ์‹œ๋„ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํฌํ•จ
Downloads last month
12
Safetensors
Model size
7.62B params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for seong67360/Qwen2.5-7B-Instruct_v4

Base model

Qwen/Qwen2.5-7B
Finetuned
(188)
this model

Datasets used to train seong67360/Qwen2.5-7B-Instruct_v4