Triangle104 commited on
Commit
6dd1cd1
·
verified ·
1 Parent(s): 861bfcf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md CHANGED
@@ -16,6 +16,129 @@ tags:
16
  This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Use with llama.cpp
20
  Install llama.cpp through brew (works on Mac and Linux)
21
 
 
16
  This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
18
 
19
+ ---
20
+
21
+
22
+
23
+ The chat template for our models is formatted as:
24
+
25
+
26
+ <|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
27
+
28
+ Or with new lines expanded:
29
+
30
+ <|user|>
31
+ How are you doing?
32
+ <|assistant|>
33
+ I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
34
+
35
+ It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
36
+
37
+ System prompt
38
+
39
+
40
+
41
+
42
+ In Ai2 demos, we use this system prompt by default:
43
+
44
+
45
+ You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
46
+
47
+ The model has not been trained with a specific system prompt in mind.
48
+
49
+ Bias, Risks, and Limitations
50
+
51
+
52
+
53
+
54
+ The Tülu3 models have limited safety training, but are not deployed
55
+ automatically with in-the-loop filtering of responses like ChatGPT, so
56
+ the model can produce problematic outputs (especially when prompted to
57
+ do so).
58
+ It is also unknown what the size and composition of the corpus was used
59
+ to train the base Llama 3.1 models, however it is likely to have
60
+ included a mix of Web data and technical sources like books and code.
61
+ See the Falcon 180B model card for an example of this.
62
+
63
+ Hyperparamters
64
+
65
+ PPO settings for RLVR:
66
+
67
+ Learning Rate: 3 × 10⁻⁷
68
+ Discount Factor (gamma): 1.0
69
+ General Advantage Estimation (lambda): 0.95
70
+ Mini-batches (N_mb): 1
71
+ PPO Update Iterations (K): 4
72
+ PPO's Clipping Coefficient (epsilon): 0.2
73
+ Value Function Coefficient (c1): 0.1
74
+ Gradient Norm Threshold: 1.0
75
+ Learning Rate Schedule: Linear
76
+ Generation Temperature: 1.0
77
+ Batch Size (effective): 512
78
+ Max Token Length: 2,048
79
+ Max Prompt Token Length: 2,048
80
+ Penalty Reward Value for Responses without an EOS Token: -10.0
81
+ Response Length: 1,024 (but 2,048 for MATH)
82
+ Total Episodes: 100,000
83
+ KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
84
+ Warm up ratio (omega): 0.0
85
+
86
+ License and use
87
+
88
+
89
+
90
+
91
+ All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
92
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
93
+ Tülu3 is intended for research and educational use.
94
+ For more information, please see our Responsible Use Guidelines.
95
+
96
+
97
+ The models have been fine-tuned using a dataset mix with outputs
98
+ generated from third party models and are subject to additional terms:
99
+ Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
100
+
101
+ Citation
102
+
103
+
104
+
105
+
106
+ If Tülu3 or any of the related materials were helpful to your work, please cite:
107
+
108
+
109
+ @article{lambert2024tulu3,
110
+ title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
111
+ author = {
112
+ Nathan Lambert and
113
+ Jacob Morrison and
114
+ Valentina Pyatkin and
115
+ Shengyi Huang and
116
+ Hamish Ivison and
117
+ Faeze Brahman and
118
+ Lester James V. Miranda and
119
+ Alisa Liu and
120
+ Nouha Dziri and
121
+ Shane Lyu and
122
+ Yuling Gu and
123
+ Saumya Malik and
124
+ Victoria Graf and
125
+ Jena D. Hwang and
126
+ Jiangjiang Yang and
127
+ Ronan Le Bras and
128
+ Oyvind Tafjord and
129
+ Chris Wilhelm and
130
+ Luca Soldaini and
131
+ Noah A. Smith and
132
+ Yizhong Wang and
133
+ Pradeep Dasigi and
134
+ Hannaneh Hajishirzi
135
+ },
136
+ year = {2024},
137
+ email = {[email protected]}
138
+ }
139
+
140
+ ---
141
+
142
  ## Use with llama.cpp
143
  Install llama.cpp through brew (works on Mac and Linux)
144