leafspark
/

Llama-3-8B-Instruct-Gradient-4194k-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

leafspark commited on May 9, 2024

Commit

67a2f13

·

verified ·

1 Parent(s): 25a7ee6

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -12,7 +12,13 @@ pipeline_tag: text-generation
 # leafspark/llama-3-8b-instruct-gradient-4194k.Q8_0-GGUF
-# Please use iMatrix quants to avoid any output issues, currently debugging the issue
 This model was converted to GGUF format from [`gradientai/Llama-3-8B-Instruct-Gradient-4194k`](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k) for more details on the model.

 # leafspark/llama-3-8b-instruct-gradient-4194k.Q8_0-GGUF
+# Fixing prompt format issues
+- Use iMatrix for Llama 3 prompt format on Q4 and below, or try Q4_K_M fixed
+- Use ChatML for Q6 and below
+- Use any format for f16
+# Issues
+- Context length is not defined correctly in quant, not sure if this is a llama.cpp issue
 This model was converted to GGUF format from [`gradientai/Llama-3-8B-Instruct-Gradient-4194k`](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k) for more details on the model.