Edit Models filters

Inference status

Misc

arxiv: 2407.11062

AutoTrain Compatible

Inference Endpoints

text-generation-inference

4-bit precision

Misc with no match

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

99

Full-text search

Active filters: 2407.11062

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 19

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 18

ChenMnZ/Llama-2-13b-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 20

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 16

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 16

ChenMnZ/Llama-2-70b-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 17

ChenMnZ/Llama-2-70b-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 16 • 1

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 21 • 1

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 19

ChenMnZ/Llama-2-7b-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 17

ChenMnZ/Llama-2-7b-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 19

ChenMnZ/Llama-3-70b-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 18 • 1

ChenMnZ/Llama-3-70b-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 17 • 1

ChenMnZ/Llama-3-70b-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 17 • 1

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 20

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 16

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 18

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 16

ChenMnZ/Llama-3-8b-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 20

ChenMnZ/Llama-3-8b-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 18

ChenMnZ/Llama-3-8b-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 23

ChenMnZ/Llama-3-8b-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 20

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 16

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 18

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w3g128

Text Generation • Updated Jul 22, 2024 • 17

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w4g128

Text Generation • Updated Jul 22, 2024 • 16

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64

Text Generation • Updated Jul 22, 2024 • 22

ChenMnZ/Llama-3-70b-EfficientQAT-w2g128

Text Generation • Updated Jul 22, 2024 • 17 • 1

ChenMnZ/Llama-2-13b-BlockAP-w2g128

Text Generation • Updated Jul 21, 2024 • 8

ChenMnZ/Llama-2-13b-BlockAP-w2g64

Text Generation • Updated Jul 21, 2024 • 7