File size: 4,506 Bytes
55fa1b0
 
 
 
 
 
 
0d2ef03
6261cc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d2ef03
 
 
40c317f
0d2ef03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: mit
language:
- en
base_model:
- Qwen/QwQ-32B-Preview
new_version: Qwen/QwQ-32B-Preview
---
## Evaluation Results

### Evaluation Metrics

| **Groups**           | **Version** | **Filter** | **n-shot** | **Metric** | **Direction** | **Value** | **Stderr** |
|----------------------|:-----------:|:----------:|:----------:|:----------:|:-------------:|----------:|-----------:|
| **mmlu**             |      2      | none       |      -     | acc        | ↑             | 0.8034    | ±0.0032    |
|     **humanities**     |      2      | none       |      -     | acc        | ↑             | 0.7275    | ±0.0062    |
|     **other**          |      2      | none       |      -     | acc        | ↑             | 0.8323    | ±0.0064    |
|     **social sciences**|      2      | none       |      -     | acc        | ↑             | 0.8856    | ±0.0056    |
|     **stem**           |      2      | none       |      -     | acc        | ↑             | 0.8081    | ±0.0068    |

### Description

- **mmlu**: Overall accuracy across multiple domains.
- **humanities**: Accuracy in humanities-related tasks.
- **other**: Accuracy in other unspecified domains.
- **social sciences**: Accuracy in social sciences-related tasks.
- **stem**: Accuracy in STEM (Science, Technology, Engineering, Mathematics) related tasks.


# QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit

![License](https://img.shields.io/badge/license-MIT-blue)
![Stars](https://img.shields.io/badge/stars-0-lightgrey.svg)
![Downloads](https://img.shields.io/badge/downloads-0-lightgrey.svg)

## Model Description

**QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit** is a quantized version of the QwQ-32B-Preview model, optimized for efficient inference without significant loss in performance. This model employs **AutoRound** for quantization, utilizing the GPTQ (Generative Pre-trained Transformer Quantization) method with symmetric 4-bit quantization. The quantization process reduces the model size and computational requirements, making it more suitable for deployment in resource-constrained environments.

### Features

- **Quantization Method**: AutoRound with GPTQ
- **Bit Precision**: 4-bit symmetric quantization
- **Group Size**: 128
- **Efficiency**: Optimized for low GPU memory usage
- **Compatibility**: Compatible with Hugging Face's Transformers library

## Intended Uses

- **Natural Language Processing (NLP)**: Suitable for tasks such as text generation, translation, summarization, and question-answering.
- **Deployment in Resource-Constrained Environments**: Ideal for applications requiring efficient inference on devices with limited computational resources.
- **Research and Development**: Useful for researchers exploring model compression and quantization techniques.

**Note**: This model is intended for non-commercial research and experimentation purposes. Users should evaluate the model's performance in their specific use cases before deployment.

## Limitations

- **Performance Trade-off**: While quantization significantly reduces model size and increases inference speed, it may introduce slight degradations in performance compared to the full-precision version.
- **Compatibility**: The quantized model may not be compatible with all libraries and frameworks. Ensure compatibility with your deployment environment.
- **Bias and Fairness**: As with all language models, this model may inherit biases present in the training data. Users should be cautious and perform thorough evaluations before deploying in sensitive applications.

## Usage Example:

Here's a simple example of how to load and use the quantized model with Hugging Face's Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit")

# Load the quantized model
model = AutoModelForCausalLM.from_pretrained(
    "Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit",
    load_in_4bit=True,
    device_map="auto"
)

# Prepare input
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate text
outputs = model.generate(**inputs, max_length=50)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)


# Output:

Once upon a time, in a land far away, there lived a...