ehartford/WizardLM-7B-Uncensored quantized to 8bit GPTQ with act order + true sequential, no group size.

For most uses this probably isn't what you want.
For 4bit with no act order or compatibility with old-cuda (text-generation-webui default) see TheBloke/WizardLM-7B-uncensored-GPTQ

Quantized using AutoGPTQ with the following config:

config: dict = dict(
    quantize_config=dict(bits=8, desc_act=True, true_sequential=True, model_file_base_name='WizardLM-7B-Uncensored'),
    use_safetensors=True
)

See quantize.py for the full script.

Tested for compatibility with:

  • WSL with GPTQ-for-Llama triton branch.
  • Windows with AutoGPTQ on cuda (triton deselected)

AutoGPTQ loader should read configuration from quantize_config.json.
For GPTQ-for-Llama use the following configuration when loading:
wbits: 8
groupsize: None
model_type: llama

Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.