metadata

license: apache-2.0
language:
  - en
base_model:
  - Qwen/QwQ-32B-Preview
pipeline_tag: text-generation
tags:
  - chat
  - qwen2

QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

Quantization: 4-bit using Bits and Bytes (bnb)
Base Model: Qwen/QwQ-32B-Preview
Parameters: 32.5 billion
Context Length: Up to 32,768 tokens