MPTK-1B

MPTK-1B๋Š” ํ•œ๊ตญ์–ด/์˜์–ด์ฝ”๋“œ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šต๋œ 1.3B ํŒŒ๋ผ๋ฏธํ„ฐ์˜ decoder-only transformer ์–ธ์–ด๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ ๊ตฌ๊ธ€์˜ TPU Research Cloud(TRC)๋ฅผ ํ†ตํ•ด ์ง€์›๋ฐ›์€ Cloud TPU๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Model Details

Model Description

๋‹ค๋ฅธ decoder-only transformer์—์„œ ์ผ๋ถ€ ์ˆ˜์ •๋œ ์•„ํ‚คํ…์ฒ˜์ธ MPT๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

Hyperparameter Value
n_parameters 1.3B
n_layers 24
n_heads 16
d_model 2048
vocab size 50432
sequence length 2048

Uses

How to Get Started with the Model

fp16์œผ๋กœ ์‹คํ–‰ ์‹œ NaN์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ fp32 ํ˜น์€ bf16๋กœ ์‹คํ–‰ํ•˜๊ธฐ๋ฅผ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("team-lucid/mptk-1b")
model = AutoModelForCausalLM.from_pretrained("team-lucid/mptk-1b")

pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe(
            '๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์ˆ˜๋„๋Š”',
            max_new_tokens=100,
            do_sample=True,
        )
    )

Training Details

Training Data

OSCAR, mC4, wikipedia, namuwiki ๋“ฑ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— RefinedWeb, The Stack ์—์„œ ์ผ๋ถ€๋ฅผ ์ถ”๊ฐ€ํ•ด ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Training Hyperparameters

Hyperparameter Value
Precision bfloat16
Optimizer Lion
Learning rate 2e-4
Batch size 1024
Downloads last month
518
Safetensors
Model size
1.31B params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.