|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/QwQ-32B-Preview |
|
pipeline_tag: text-generation |
|
tags: |
|
- chat |
|
- qwen2 |
|
--- |
|
# QwQ-32B-Preview-bnb-4bit |
|
|
|
## Introduction |
|
|
|
QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the [QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware. |
|
|
|
## Model Details |
|
|
|
- **Quantization:** 4-bit using Bits and Bytes (bnb) |
|
- **Base Model:** [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) |
|
- **Parameters:** 32.5 billion |
|
- **Context Length:** Up to 32,768 tokens |