xmadai
/

Llama-3.1-8B-Instruct-xMADai-INT4

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

onebitquantized commited on Oct 23, 2024

Commit

065217a

·

verified ·

1 Parent(s): 9a33622

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ base_model:
 # This model has been xMADified!
-This repository contains [`Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) quantized from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.
 # Why should I use this model?
@@ -15,10 +15,11 @@ This repository contains [`Llama-3.1-8B-Instruct`](https://huggingface.co/meta-l
 2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [neuralmagic](https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16)-quantized model (the same model size for a fair comparison). The xMADai model offers higher accuracy across all benchmarks.
-| Model | MMLU | Arc Challenge | Arc Easy | LAMBADA Standard | LAMBADA OpenAI | PIQA | WinoGrande | HellaSwag |
-|---|---|---|---|---|---|---|---|---|
-| [neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16](https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16) | 64.82 | 47.78 | 78.66 | 62.95 | 70.41 | 78.67 | 72.61 | 58.04 |
-| xmadai/Llama-3.1-8B-Instruct-xMADai-INT4 | **66.83** | **52.3** | **82.11** | **65.73** | **73.3** | **79.87** | **72.77** | **58.49** |
 # How to Run Model

 # This model has been xMADified!
+This repository contains [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) quantized from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.
 # Why should I use this model?
 2. **Accuracy:** This xMADified model preserves the quality of the full-precision model. In the table below, we present the zero-shot accuracy on popular benchmarks of this xMADified model against the [neuralmagic](https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16)-quantized model (the same model size for a fair comparison). The xMADai model offers higher accuracy across all benchmarks.
+| Model | Size | MMLU | Arc Challenge | Arc Easy | LAMBADA Standard | LAMBADA OpenAI | PIQA | WinoGrande | HellaSwag |
+|---|---|---|---|---|---|---|---|---|---|
+| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | 16.1 GB | 68.05 | 51.71 | 81.90 | 66.18 | 73.55 | 79.87 | 73.72 | 59.10 |
+| [neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16](https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16) | 5.7 GB | 64.82 | 47.78 | 78.66 | 62.95 | 70.41 | 78.67 | 72.61 | 58.04 |
+| xmadai/Llama-3.1-8B-Instruct-xMADai-INT4 (this model) | 5.7 GB | **66.83** | **52.30** | **82.11** | **65.73** | **73.30** | **79.87** | **72.77** | **58.49** |
 # How to Run Model