Hack337 commited on
Commit
94a4377
·
verified ·
1 Parent(s): 6810e6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -3
README.md CHANGED
@@ -1,3 +1,118 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: Qwen/Qwen2-1.5B-Instruct
4
+ pipeline_tag: text-generation
5
+ license: apache-2.0
6
+ ---
7
+
8
+ # Model Card for Model ID
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+
18
+ <!-- Provide a longer summary of what this model is. -->
19
+
20
+
21
+
22
+ - **Developed by: hack337**
23
+ - **Model type: qwen2**
24
+ - **Finetuned from model: Qwen/Qwen2-1.5B-Instruct**
25
+
26
+ ### Model Sources [optional]
27
+
28
+ <!-- Provide the basic links for the model. -->
29
+
30
+ - **Repository: https://huggingface.co/Hack337/WavGPT-1.0**
31
+ - **Demo: https://huggingface.co/spaces/Hack337/WavGPT**
32
+
33
+ ## How to Get Started with the Model
34
+
35
+ Use the code below to get started with the model.
36
+
37
+ ```python
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer
39
+ device = "cuda" # the device to load the model onto
40
+
41
+ model = AutoModelForCausalLM.from_pretrained(
42
+ "Hack337/WavGPT-1.0-merged",
43
+ torch_dtype="auto",
44
+ device_map="auto"
45
+ )
46
+ tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
47
+
48
+ prompt = "Give me a short introduction to large language model."
49
+ messages = [
50
+ {"role": "system", "content": "Вы очень полезный помощник."},
51
+ {"role": "user", "content": prompt}
52
+ ]
53
+ text = tokenizer.apply_chat_template(
54
+ messages,
55
+ tokenize=False,
56
+ add_generation_prompt=True
57
+ )
58
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
59
+
60
+ generated_ids = model.generate(
61
+ model_inputs.input_ids,
62
+ max_new_tokens=512
63
+ )
64
+ generated_ids = [
65
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
66
+ ]
67
+
68
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
69
+
70
+ ```
71
+
72
+ Use the code below to get started with the model using NPU.
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, TextStreamer
76
+ from intel_npu_acceleration_library import NPUModelForCausalLM
77
+ import torch
78
+
79
+ # Load the NPU-optimized model without LoRA
80
+ model = NPUModelForCausalLM.from_pretrained(
81
+ "Hack337/WavGPT-1.0-merged",
82
+ use_cache=True,
83
+ dtype=torch.float16 # Use float16 for the NPU
84
+ ).eval()
85
+
86
+ # Load the tokenizer
87
+ tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
88
+ tokenizer.pad_token_id = tokenizer.eos_token_id
89
+ streamer = TextStreamer(tokenizer, skip_special_tokens=True)
90
+
91
+ # Prompt handling
92
+ prompt = "Give me a short introduction to large language model."
93
+ messages = [
94
+ {"role": "system", "content": "Вы очень полезный помощник."},
95
+ {"role": "user", "content": prompt}
96
+ ]
97
+
98
+ # Convert to a text format compatible with the model
99
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
100
+ prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")
101
+
102
+ # Generation configuration
103
+ generation_kwargs = dict(
104
+ input_ids=prefix,
105
+ streamer=streamer,
106
+ do_sample=True,
107
+ top_k=50,
108
+ top_p=0.9,
109
+ max_new_tokens=512,
110
+ )
111
+
112
+ # Run inference on the NPU
113
+ print("Run inference")
114
+ _ = model.generate(**generation_kwargs)
115
+
116
+ ```
117
+
118
+ - PEFT 0.11.1