How to use GPTQ model

https://github.com/jongmin-oh/korean-LLM-quantize

mkdir ./templates && mkdir ./utils && wget -P ./templates https://raw.githubusercontent.com/jongmin-oh/korean-LLM-quantize/main/templates/kullm.json && wget -P ./utils https://raw.githubusercontent.com/jongmin-oh/korean-LLM-quantize/main/utils/prompter.py

install package

pip install torch==2.0.1 auto-gptq==0.4.2
  • ๊ธ‰ํ•˜์‹ ๋ถ„๋“ค์€ ๋ฐ‘์— ์˜ˆ์ œ์ฝ”๋“œ ์‹คํ–‰ํ•˜์‹œ๋ฉด ๋ฐ”๋กœ ํ…Œ์ŠคํŠธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. (GPU memory 19GB ์ ์œ )
  • 2023-08-23์ผ ์ดํ›„๋ถ€ํ„ฐ๋Š” huggingFace์—์„œ GPTQ๋ฅผ ๊ณต์‹์ง€์›ํ•˜๊ฒŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
import torch
from transformers import pipeline
from auto_gptq import AutoGPTQForCausalLM

from utils.prompter import Prompter

MODEL = "j5ng/kullm-12.8b-GPTQ-8bit"
model = AutoGPTQForCausalLM.from_quantized(MODEL, device="cuda:0", use_triton=False)

pipe = pipeline('text-generation', model=model,tokenizer=MODEL)

prompter = Prompter("kullm")

def infer(instruction="", input_text=""):
    prompt = prompter.generate_prompt(instruction, input_text)
    output = pipe(
        prompt, max_length=512,
        temperature=0.2,
        repetition_penalty=3.0,
        num_beams=5,
        eos_token_id=2
    )
    s = output[0]["generated_text"]
    result = prompter.get_response(s)

    return result

instruction = """
์†ํฅ๋ฏผ(ํ•œ๊ตญ ํ•œ์ž: ๅญซ่ˆˆๆ…œ, 1992๋…„ 7์›” 8์ผ ~ )์€ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์ถ•๊ตฌ ์„ ์ˆ˜๋กœ ํ˜„์žฌ ์ž‰๊ธ€๋žœ๋“œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ํ† ํŠธ๋„˜ ํ™‹์Šคํผ์—์„œ ์œ™์–ด๋กœ ํ™œ์•ฝํ•˜๊ณ  ์žˆ๋‹ค.
๋˜ํ•œ ๋Œ€ํ•œ๋ฏผ๊ตญ ์ถ•๊ตฌ ๊ตญ๊ฐ€๋Œ€ํ‘œํŒ€์˜ ์ฃผ์žฅ์ด์ž 2018๋…„ ์•„์‹œ์•ˆ ๊ฒŒ์ž„ ๊ธˆ๋ฉ”๋‹ฌ๋ฆฌ์ŠคํŠธ์ด๋ฉฐ ์˜๊ตญ์—์„œ๋Š” ์• ์นญ์ธ "์˜๋‹ˆ"(Sonny)๋กœ ๋ถˆ๋ฆฐ๋‹ค.
์•„์‹œ์•„ ์„ ์ˆ˜๋กœ์„œ๋Š” ์—ญ๋Œ€ ์ตœ์ดˆ๋กœ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๊ณต์‹ ๋ฒ ์ŠคํŠธ ์ผ๋ ˆ๋ธ๊ณผ ์•„์‹œ์•„ ์„ ์ˆ˜ ์ตœ์ดˆ์˜ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๋“์ ์™•์€ ๋ฌผ๋ก  FIFA ํ‘ธ์Šค์นด์Šค์ƒ๊นŒ์ง€ ํœฉ์“ธ์—ˆ๊ณ  2022๋…„์—๋Š” ์ถ•๊ตฌ ์„ ์ˆ˜๋กœ๋Š” ์ตœ์ดˆ๋กœ ์ฒด์œกํ›ˆ์žฅ ์ฒญ๋ฃก์žฅ ์ˆ˜ํ›ˆ์ž๊ฐ€ ๋˜์—ˆ๋‹ค.
์†ํฅ๋ฏผ์€ ํ˜„์žฌ ๋ฆฌ๊ทธ 100ํ˜ธ๋ฅผ ๋„ฃ์–ด์„œ ํ™”์ œ๊ฐ€ ๋˜๊ณ  ์žˆ๋‹ค.
"""
result = infer(instruction=instruction, input_text="์†ํฅ๋ฏผ์˜ ์• ์นญ์€ ๋ญ์•ผ?")
print(result) # ์†ํฅ๋ฏผ์˜ ์• ์นญ์€ "์˜๋‹ˆ"์ž…๋‹ˆ๋‹ค.

Reference

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.