Model Information
The vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32
model is a quantized version Meta-Llama-3.1-70B-Instruct
that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4.
Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s
Model Details
Model Description
- Developed by: Meta
- Model type: Quantized Large Language Model
- Language(s) (NLP): English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- License: Llama 3.1
- Dequantized From: hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
Compute Infrastructure
- Vultr
Hardware
- AMD MI300X
Software
- ROCm
Model Author
- Downloads last month
- 17
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32
Base model
meta-llama/Llama-3.1-70B