metadata

language:
  - en
license: other
library_name: transformers
tags:
  - chat
  - qwen
  - qwen2.5
  - finetune
  - english
base_model:
  - MaziyarPanahi/calme-3.2-instruct-78b
model_name: calme-3.2-instruct-78b
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
inference: false
model_creator: MaziyarPanahi
quantized_by: MaziyarPanahi
model-index:
  - name: calme-3.2-instruct-78b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 80.63
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 62.61
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 39.95
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 20.36
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 38.53
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 70.03
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard

EXL2 4.5bpw Quantization of calme-3.2-instruct-78b

This repository hosts the 4.5 bits per weight (bpw) quantization of the calme-3.2-instruct-78b model, leveraging the ExLlamaV2 format for efficient inference with high-context capabilities. This model is a Qwen 2.5 finetune.

Quantization Details

Format: ExLlamaV2 4.5bpw
Version: ExLlamaV2 0.2.6
Model Size: 78 billion parameters
VRAM Usage: Approx. 44GB (32,000 context)
Calibration:
- Rows: 115
- Length: 2048
- Dataset: (default)

The quantization process reduces memory usage and inference latency while maintaining high performance for generative text tasks.

Prompt Template

This model uses the ChatML prompt template for interaction:

<|im_start|>system
{System}
<|im_end|>
<|im_start|>user
{User}
<|im_end|>
<|im_start|>assistant
{Assistant}

Model Usage

Example: Inference with ExLlamaV2

To use this quantized model, ensure you have the ExLlamaV2 library installed:

pip install exllamav2

from exllamav2 import ExLlamaModel, ExLlamaTokenizer, ExLlamaPipeline

# Load model and tokenizer
model = ExLlamaModel.from_pretrained("DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw")
tokenizer = ExLlamaTokenizer.from_pretrained("DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw")

# Create pipeline
pipeline = ExLlamaPipeline(model, tokenizer)

# Generate text
messages = [{"role": "user", "content": "What is EXL2 quantization?"}]
response = pipeline(messages)
print(response)

Features

EXL2 format requires Nvidia hardware but runs faster and with less RAM than GGUF.
Supports 44GB VRAM with 32,000 context window.
40GB minimum 1024 context window
Highly optimized for inference, making it ideal for resource-constrained environments.
Compatible with ChatML-based prompting systems.

Acknowledgments

Original Model Creator: MaziyarPanahi
Quantization by: DavidCatalano
Quantization Tool: ExLlamaV2 0.2.6

Download Instructions

To download the model files:

huggingface-cli install huggingface_hub
huggingface-cli login
huggingface-cli download DavidCatalano/calme-3.2-instruct-78b-exl2-4.5bpw --include "*" --local-dir ./local-folder