Hydra Decoder (Codebase)

We currently support inference in the single GPU and batch size 1 setting, which is consistent with Medusa and is also the most common setup for local model hosting.

Model Weights. We uploaded three model weights for users to try.

Base Model	Description	Hugging Face Repo
Vicuna-7b	Medusa Model (Original)	FasterDecoding/medusa-vicuna-7b-v1.3
Vicuna-7b	Medusa Model - 3 Head (Ours)	Rango2000/medusa-3h-vicuna-7b-v1.3
Vicuna-7b	Hydra Model - 3 Head - 1 Decoding Layer (Ours)	shiqihe/hydra-decoder-1l-vicuna-7b-v1.3
Vicuna-7b	Hydra Model - 3 Head - 2 Decoding Layer (Ours)	shiqihe/hydra-decoder-2l-vicuna-7b-v1.3

Inference. You can use the following command to launch a CLI interface:

# optional
CUDA_VISIBLE_DEVICES=0
# run cli
python -m medusa.inference.cli --model [path/repo of hydra decoder]

shiqihe
/

hydra-decoder-2l-vicuna-7b-v1.3

Hydra Decoder (Codebase)

license: mit