EXL2 4.5bpw Quantization of calme-3.2-instruct-78b

This repository hosts the 4.5 bits per weight (bpw) quantization of the calme-3.2-instruct-78b model, leveraging the ExLlamaV2 format for efficient inference with high-context capabilities. This model is a Qwen 2.5 finetune.

Quantization Details

Format: ExLlamaV2 4.5bpw
Version: ExLlamaV2 0.2.6
Model Size: 78 billion parameters
VRAM Usage: Approx. 44GB (32,000 context)
Calibration:
- Rows: 115
- Length: 2048
- Dataset: (default)

The quantization process reduces memory usage and inference latency while maintaining high performance for generative text tasks.

Prompt Template

This model uses the ChatML prompt template for interaction:

<|im_start|>system
{System}
<|im_end|>
<|im_start|>user
{User}
<|im_end|>
<|im_start|>assistant
{Assistant}

Model Usage

Example: Inference with ExLlamaV2

To use this quantized model, ensure you have the ExLlamaV2 library installed:

pip install exllamav2

from exllamav2 import ExLlamaModel, ExLlamaTokenizer, ExLlamaPipeline

# Load model and tokenizer
model = ExLlamaModel.from_pretrained("DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw")
tokenizer = ExLlamaTokenizer.from_pretrained("DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw")

# Create pipeline
pipeline = ExLlamaPipeline(model, tokenizer)

# Generate text
messages = [{"role": "user", "content": "What is EXL2 quantization?"}]
response = pipeline(messages)
print(response)

Features

EXL2 format requires Nvidia hardware but runs faster and with less RAM than GGUF.
Supports 44GB VRAM with 32,000 context window.
40GB minimum 1024 context window
Highly optimized for inference, making it ideal for resource-constrained environments.
Compatible with ChatML-based prompting systems.

Acknowledgments

Original Model Creator: MaziyarPanahi
Quantization by: DavidCatalano
Quantization Tool: ExLlamaV2 0.2.6

Download Instructions

To download the model files:

huggingface-cli install huggingface_hub
huggingface-cli login
huggingface-cli download DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw --include "*" --local-dir ./local-folder

DavidCatalano
/

calme-3.2-instruct-78b-exl2