kurcontko's picture
Update README.md
04fe3e6 verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/QwQ-32B-Preview
pipeline_tag: text-generation
tags:
  - chat
  - qwen2

QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

  • Quantization: 4-bit using Bits and Bytes (bnb)
  • Base Model: Qwen/QwQ-32B-Preview
  • Parameters: 32.5 billion
  • Context Length: Up to 32,768 tokens