Hack337
/

WavGPT-1.0-merged

Text Generation

Model card Files Files and versions Community

Hack337 commited on Sep 18, 2024

Commit

94a4377

·

verified ·

1 Parent(s): 6810e6c

Update README.md

Files changed (1) hide show

README.md +118 -3

README.md CHANGED Viewed

@@ -1,3 +1,118 @@
----
-license: apache-2.0
----

+---
+library_name: peft
+base_model: Qwen/Qwen2-1.5B-Instruct
+pipeline_tag: text-generation
+license: apache-2.0
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by: hack337**
+- **Model type: qwen2**
+- **Finetuned from model: Qwen/Qwen2-1.5B-Instruct**
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository: https://huggingface.co/Hack337/WavGPT-1.0**
+- **Demo: https://huggingface.co/spaces/Hack337/WavGPT**
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained(
+    "Hack337/WavGPT-1.0-merged",
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "system", "content": "Вы очень полезный помощник."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+Use the code below to get started with the model using NPU.
+```python
+from transformers import AutoTokenizer, TextStreamer
+from intel_npu_acceleration_library import NPUModelForCausalLM
+import torch
+# Load the NPU-optimized model without LoRA
+model = NPUModelForCausalLM.from_pretrained(
+    "Hack337/WavGPT-1.0-merged",
+    use_cache=True,
+    dtype=torch.float16  # Use float16 for the NPU
+).eval()
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
+tokenizer.pad_token_id = tokenizer.eos_token_id
+streamer = TextStreamer(tokenizer, skip_special_tokens=True)
+# Prompt handling
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "system", "content": "Вы очень полезный помощник."},
+    {"role": "user", "content": prompt}
+]
+# Convert to a text format compatible with the model
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")
+# Generation configuration
+generation_kwargs = dict(
+    input_ids=prefix,
+    streamer=streamer,
+    do_sample=True,
+    top_k=50,
+    top_p=0.9,
+    max_new_tokens=512,
+)
+# Run inference on the NPU
+print("Run inference")
+_ = model.generate(**generation_kwargs)
+```
+- PEFT 0.11.1