Model Information

The vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32 model is a quantized version Meta-Llama-3.1-70B-Instruct that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4.

Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s

Model Details

Model Description

Compute Infrastructure

  • Vultr

Hardware

  • AMD MI300X

Software

  • ROCm

Model Author

Downloads last month
17
Safetensors
Model size
70.6B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32

Finetuned
(29)
this model