YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Hydra Decoder (Codebase)
We currently support inference in the single GPU and batch size 1 setting, which is consistent with Medusa and is also the most common setup for local model hosting.
Model Weights. We uploaded three model weights for users to try.
Base Model | Description | Hugging Face Repo |
---|---|---|
Vicuna-7b | Medusa Model (Original) | FasterDecoding/medusa-vicuna-7b-v1.3 |
Vicuna-7b | Medusa Model - 3 Head (Ours) | Rango2000/medusa-3h-vicuna-7b-v1.3 |
Vicuna-7b | Hydra Model - 3 Head - 1 Decoding Layer (Ours) | shiqihe/hydra-decoder-1l-vicuna-7b-v1.3 |
Vicuna-7b | Hydra Model - 3 Head - 2 Decoding Layer (Ours) | shiqihe/hydra-decoder-2l-vicuna-7b-v1.3 |
Inference. You can use the following command to launch a CLI interface:
# optional
CUDA_VISIBLE_DEVICES=0
# run cli
python -m medusa.inference.cli --model [path/repo of hydra decoder]
license: mit
- Downloads last month
- 8