Triangle104
/

Llama-3.1-Tulu-3-8B-Q6_K-GGUF

@@ -16,6 +16,129 @@ tags:
 This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
+---
+The chat template for our models is formatted as:
+<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
+Or with new lines expanded:
+<|user|>
+How are you doing?
+<|assistant|>
+I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
+It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
+		System prompt
+In Ai2 demos, we use this system prompt by default:
+You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
+The model has not been trained with a specific system prompt in mind.
+		Bias, Risks, and Limitations
+The Tülu3 models have limited safety training, but are not deployed
+automatically with in-the-loop filtering of responses like ChatGPT, so
+the model can produce problematic outputs (especially when prompted to
+do so).
+It is also unknown what the size and composition of the corpus was used
+to train the base Llama 3.1 models, however it is likely to have
+included a mix of Web data and technical sources like books and code.
+See the Falcon 180B model card for an example of this.
+Hyperparamters
+PPO settings for RLVR:
+Learning Rate: 3 × 10⁻⁷
+Discount Factor (gamma): 1.0
+General Advantage Estimation (lambda): 0.95
+Mini-batches (N_mb): 1
+PPO Update Iterations (K): 4
+PPO's Clipping Coefficient (epsilon): 0.2
+Value Function Coefficient (c1): 0.1
+Gradient Norm Threshold: 1.0
+Learning Rate Schedule: Linear
+Generation Temperature: 1.0
+Batch Size (effective): 512
+Max Token Length: 2,048
+Max Prompt Token Length: 2,048
+Penalty Reward Value for Responses without an EOS Token: -10.0
+Response Length: 1,024 (but 2,048 for MATH)
+Total Episodes: 100,000
+KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
+Warm up ratio (omega): 0.0
+		License and use
+All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
+Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
+Tülu3 is intended for research and educational use.
+For more information, please see our Responsible Use Guidelines.
+The models have been fine-tuned using a dataset mix with outputs
+generated from third party models and are subject to additional terms:
+Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
+		Citation
+If Tülu3 or any of the related materials were helpful to your work, please cite:
+@article{lambert2024tulu3,
+  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
+  author = {
+    Nathan Lambert and
+    Jacob Morrison and
+    Valentina Pyatkin and
+    Shengyi Huang and
+    Hamish Ivison and
+    Faeze Brahman and
+    Lester James V. Miranda and
+    Alisa Liu and
+    Nouha Dziri and
+    Shane Lyu and
+    Yuling Gu and
+    Saumya Malik and
+    Victoria Graf and
+    Jena D. Hwang and
+    Jiangjiang Yang and
+    Ronan Le Bras and
+    Oyvind Tafjord and
+    Chris Wilhelm and
+    Luca Soldaini and
+    Noah A. Smith and
+    Yizhong Wang and
+    Pradeep Dasigi and
+    Hannaneh Hajishirzi
+  },
+  year = {2024},
+  email = {[email protected]}
+}
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)