xmadai
/

Llama-3.1-8B-Instruct-xMADai-INT4

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Oscar Wu commited on Oct 23, 2024

Commit

d1867a5

·

1 Parent(s): 16e0a5a

Updated README

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ This repository contains [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingfac
 # Why should I use this model?
-1. **Accuracy:** This xMADified model is the best **quantized** version of the `meta-llama/Llama-3.1-8B-Instruct` model. We **crush all other quantized** versions (see _Table 1_ below).
 2. **Memory-efficiency:** The full-precision model is around 16 GB, while this xMADified model is only 5.7 GB, making it feasible to run on a 8 GB GPU.

 # Why should I use this model?
+1. **Accuracy:** This xMADified model is the best **quantized** version of the `meta-llama/Llama-3.1-8B-Instruct` model. We **crush the most downloaded quantized** version(s) (see _Table 1_ below).
 2. **Memory-efficiency:** The full-precision model is around 16 GB, while this xMADified model is only 5.7 GB, making it feasible to run on a 8 GB GPU.