kurcontko's picture
Create README.md
45c53f2 verified
|
raw
history blame
621 Bytes
# QwQ-32B-Preview-bnb-4bit
## Introduction
QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the [QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.
## Model Details
- **Quantization:** 4-bit using Bits and Bytes (bnb)
- **Base Model:** [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)
- **Parameters:** 32.5 billion
- **Context Length:** Up to 32,768 tokens