MaziyarPanahi commited on
Commit
f63952b
·
verified ·
1 Parent(s): cf80a4b

Create README.md (#3)

Browse files

- Create README.md (496e96a1f4c3327a71abe705b2312e6d8d21aee2)

Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Meta-Llama-3-70B-Instruct
3
+ library_name: transformers
4
+ tags:
5
+ - axolotl
6
+ - finetune
7
+ - dpo
8
+ - facebook
9
+ - meta
10
+ - pytorch
11
+ - llama
12
+ - llama-3
13
+ - chatml
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
+ license: llama3
18
+ license_name: llama3
19
+ license_link: LICENSE
20
+ inference: false
21
+ model_creator: MaziyarPanahi
22
+ model_name: Llama-3-70B-Instruct-DPO-v0.1
23
+ quantized_by: MaziyarPanahi
24
+ datasets:
25
+ - mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
26
+ ---
27
+
28
+ <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
29
+
30
+
31
+ # Llama-3-8B-Instruct-DPO-v0.3 (32k)
32
+
33
+ This model is a fine-tune (DPO) of `meta-llama/Meta-Llama-3-8B-Instruct` model. I have used `rope_theta` to extend the context length up to 32K safely.
34
+
35
+ # Quantized GGUF
36
+
37
+ All GGUF models come with context length of `32000`: [Llama-3-8B-Instruct-DPO-v0.3-32k-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3-32k-GGUF)
38
+
39
+ # Prompt Template
40
+
41
+ This model uses `ChatML` prompt template:
42
+
43
+ ```
44
+ <|im_start|>system
45
+ {System}
46
+ <|im_end|>
47
+ <|im_start|>user
48
+ {User}
49
+ <|im_end|>
50
+ <|im_start|>assistant
51
+ {Assistant}
52
+ ````
53
+
54
+ # How to use
55
+
56
+ You can use this model by using `MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3` as the model name in Hugging Face's
57
+ transformers library.
58
+
59
+ ```python
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
61
+ from transformers import pipeline
62
+ import torch
63
+
64
+ model_id = "MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3"
65
+
66
+ model = AutoModelForCausalLM.from_pretrained(
67
+ model_id,
68
+ torch_dtype=torch.bfloat16,
69
+ device_map="auto",
70
+ trust_remote_code=True,
71
+ # attn_implementation="flash_attention_2"
72
+ )
73
+
74
+ tokenizer = AutoTokenizer.from_pretrained(
75
+ model_id,
76
+ trust_remote_code=True
77
+ )
78
+
79
+ streamer = TextStreamer(tokenizer)
80
+
81
+ pipeline = pipeline(
82
+ "text-generation",
83
+ model=model,
84
+ tokenizer=tokenizer,
85
+ model_kwargs={"torch_dtype": torch.bfloat16},
86
+ streamer=streamer
87
+ )
88
+
89
+ # Then you can use the pipeline to generate text.
90
+
91
+ messages = [
92
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
93
+ {"role": "user", "content": "Who are you?"},
94
+ ]
95
+
96
+ prompt = tokenizer.apply_chat_template(
97
+ messages,
98
+ tokenize=False,
99
+ add_generation_prompt=True
100
+ )
101
+
102
+ terminators = [
103
+ tokenizer.eos_token_id,
104
+ tokenizer.convert_tokens_to_ids("<|im_end|>")
105
+ ]
106
+
107
+ outputs = pipeline(
108
+ prompt,
109
+ max_new_tokens=2048,
110
+ eos_token_id=terminators,
111
+ do_sample=True,
112
+ temperature=0.6,
113
+ top_p=0.95,
114
+ )
115
+ print(outputs[0]["generated_text"][len(prompt):])
116
+ ```