|
--- |
|
license: creativeml-openrail-m |
|
language: |
|
- en |
|
base_model: prithivMLmods/GWQ-9B-Preview2 |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- gemma2 |
|
- text-generation-inference |
|
- f16 |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/GWQ-9B-Preview2-Q5_K_S-GGUF |
|
This model was converted to GGUF format from [`prithivMLmods/GWQ-9B-Preview2`](https://huggingface.co/prithivMLmods/GWQ-9B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/prithivMLmods/GWQ-9B-Preview2) for more details on the model. |
|
|
|
--- |
|
|
|
Chain of Continuous Thought Synthetic Dataset, which enhances its |
|
ability to perform reasoning, multi-step problem solving, and logical |
|
inferences. |
|
|
|
|
|
Text Generation: |
|
The model is ideal for |
|
creative writing tasks such as generating poems, stories, and essays. It |
|
can also be used for generating code comments, documentation, and |
|
markdown files. |
|
|
|
|
|
Instruction Following: |
|
GWQ’s |
|
instruction-tuned variant is suitable for generating responses based on |
|
user instructions, making it useful for virtual assistants, tutoring |
|
systems, and automated customer support. |
|
|
|
|
|
Domain-Specific Applications: |
|
Thanks to its |
|
modular design and open-source nature, the model can be fine-tuned for |
|
specific tasks like legal document summarization, medical record |
|
analysis, or financial report generation. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Limitations of GWQ2 |
|
|
|
|
|
|
|
|
|
Resource Requirements: |
|
Although lightweight |
|
compared to larger models, the 9B parameter size still requires |
|
significant computational resources, including GPUs with large memory |
|
for inference. |
|
|
|
|
|
Knowledge Cutoff: |
|
The model’s pre-training |
|
data may not include recent information, making it less effective for |
|
answering queries on current events or newly developed topics. |
|
|
|
|
|
Bias in Outputs: |
|
Since the model is trained |
|
on publicly available datasets, it may inherit biases present in those |
|
datasets, leading to potentially biased or harmful outputs in sensitive |
|
contexts. |
|
|
|
|
|
Hallucinations: |
|
Like other large language |
|
models, GWQ can occasionally generate incorrect or nonsensical |
|
information, especially when asked for facts or reasoning outside its |
|
training scope. |
|
|
|
|
|
Lack of Common-Sense Reasoning: |
|
While GWQ is |
|
fine-tuned for reasoning, it may still struggle with tasks requiring |
|
deep common-sense knowledge or nuanced understanding of human behavior |
|
and emotions. |
|
|
|
|
|
Dependency on Fine-Tuning: |
|
For optimal |
|
performance on domain-specific tasks, fine-tuning on relevant datasets |
|
is required, which demands additional computational resources and |
|
expertise. |
|
|
|
|
|
Context Length Limitation: |
|
The model’s |
|
ability to process long documents is limited by its maximum context |
|
window size. If the input exceeds this limit, truncation may lead to |
|
loss of important information. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/GWQ-9B-Preview2-Q5_K_S-GGUF --hf-file gwq-9b-preview2-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/GWQ-9B-Preview2-Q5_K_S-GGUF --hf-file gwq-9b-preview2-q5_k_s.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/GWQ-9B-Preview2-Q5_K_S-GGUF --hf-file gwq-9b-preview2-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/GWQ-9B-Preview2-Q5_K_S-GGUF --hf-file gwq-9b-preview2-q5_k_s.gguf -c 2048 |
|
``` |
|
|