|
# QwQ-32B-Preview-bnb-4bit |
|
|
|
## Introduction |
|
|
|
QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the [QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware. |
|
|
|
## Model Details |
|
|
|
- **Quantization:** 4-bit using Bits and Bytes (bnb) |
|
- **Base Model:** [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) |
|
- **Parameters:** 32.5 billion |
|
- **Context Length:** Up to 32,768 tokens |
|
|