Edit Models filters

Inference status

Misc

Inference Endpoints

AutoTrain Compatible

text-generation-inference

4-bit precision

Misc with no match

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

31

Full-text search

Active filters: nm-vllm

neuralmagic/TinyLlama-1.1B-Chat-v1.0-pruned2.4

Text Generation • Updated Mar 5, 2024 • 28 • 1

neuralmagic/MiniChat-2-3B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 17

neuralmagic/OpenHermes-2.5-Mistral-7B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 166

neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50

Text Generation • Updated Mar 5, 2024 • 132 • 1

neuralmagic/Nous-Hermes-2-SOLAR-10.7B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 22

neuralmagic/Nous-Hermes-2-Yi-34B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 16

neuralmagic/Nous-Hermes-2-Yi-34B-pruned50

Text Generation • Updated Mar 5, 2024 • 15

neuralmagic/zephyr-7b-beta-marlin

Text Generation • Updated Mar 6, 2024 • 528

neuralmagic/llama2.c-stories110M-pruned2.4

Text Generation • Updated Mar 5, 2024 • 14

neuralmagic/llama2.c-stories110M-pruned50

Text Generation • Updated Mar 5, 2024 • 787

neuralmagic/phi-2-pruned50

Text Generation • Updated Mar 5, 2024 • 37

neuralmagic/TinyLlama-1.1B-Chat-v1.0-marlin

Text Generation • Updated Mar 6, 2024 • 2.73k • 1

neuralmagic/OpenHermes-2.5-Mistral-7B-marlin

Text Generation • Updated Mar 6, 2024 • 728 • 2

neuralmagic/Nous-Hermes-2-Yi-34B-marlin

Text Generation • Updated Mar 6, 2024 • 13 • 5

softmax/Llama-2-70b-chat-hf-marlin

Text Generation • Updated Mar 17, 2024 • 118

softmax/falcon-180B-chat-marlin

Text Generation • Updated Mar 21, 2024 • 17

dtransposed/llama2.c-stories110M-pruned50-compressed-tensors

Text Generation • Updated Apr 23, 2024 • 5

nm-testing/llama2.c-stories110M-pruned50-compressed-tensors

Text Generation • Updated Apr 25, 2024 • 6

mradermacher/Nous-Hermes-2-SOLAR-10.7B-pruned2.4-GGUF

Updated Nov 6, 2024 • 64

mradermacher/Nous-Hermes-2-SOLAR-10.7B-pruned2.4-i1-GGUF

Updated Nov 7, 2024 • 266

tensorblock/llama2.c-stories110M-pruned50-GGUF

Updated Dec 12, 2024 • 80

mradermacher/phi-2-pruned50-GGUF

Updated Dec 18, 2024 • 19

mradermacher/llama2.c-stories110M-pruned50-GGUF

Updated Dec 19, 2024 • 30

mradermacher/OpenHermes-2.5-Mistral-7B-pruned50-GGUF

Updated Dec 19, 2024 • 27

mradermacher/MiniChat-2-3B-pruned2.4-GGUF

Updated Dec 19, 2024 • 73

mradermacher/OpenHermes-2.5-Mistral-7B-pruned50-i1-GGUF

Updated Dec 19, 2024 • 29

mradermacher/llama2.c-stories110M-pruned50-i1-GGUF

Updated Dec 19, 2024 • 68

mradermacher/OpenHermes-2.5-Mistral-7B-pruned2.4-GGUF

Updated Dec 20, 2024 • 21

mradermacher/OpenHermes-2.5-Mistral-7B-pruned2.4-i1-GGUF

Updated Dec 20, 2024 • 36

tensorblock/OpenHermes-2.5-Mistral-7B-pruned2.4-GGUF

Updated 30 days ago • 178