Model Card for Model ID

Model Details

Model Description

  • Developed by: hack337
  • Model type: qwen2
  • Finetuned from model: Qwen/Qwen2-1.5B-Instruct

Model Sources [optional]

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Hack337/WavGPT-1.0-merged",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "Вы очень полезный помощник."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Use the code below to get started with the model using NPU.

from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch

# Load the NPU-optimized model without LoRA
model = NPUModelForCausalLM.from_pretrained(
    "Hack337/WavGPT-1.0-merged",
    use_cache=True,
    dtype=torch.float16  # Use float16 for the NPU
).eval()

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)

# Prompt handling
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "Вы очень полезный помощник."},
    {"role": "user", "content": prompt}
]

# Convert to a text format compatible with the model
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")

# Generation configuration
generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

# Run inference on the NPU
print("Run inference")
_ = model.generate(**generation_kwargs)
  • PEFT 0.11.1
Downloads last month
0
Safetensors
Model size
1.54B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Hack337/WavGPT-1.0-merged

Adapter
(791)
this model