Spaces:
Running
LLM Text Generation (Code)
This benchmark suite benchmarks vLLM and TGI with the code generation task.
Setup
Docker images
You can pull vLLM and TGI Docker images with:
docker pull mlenergy/vllm:v0.5.4-openai
docker pull mlenergy/tgi:v2.0.2
Installing Benchmark Script Dependencies
pip install -r requirements.txt
Starting the NVML container
Changing the power limit requires the SYS_ADMIN
Linux security capability, which we delegate to a daemon Docker container running a base CUDA image.
bash ../../common/start_nvml_container.sh
With the nvml
container running, you can change power limit with something like docker exec nvml nvidia-smi -i 0 -pl 200
.
HuggingFace cache directory
The scripts assume the HuggingFace cache directory will be under /data/leaderboard/hfcache
on the node that runs this benchmark.
Benchmarking
Obtaining one datapoint
Export your HuggingFace hub token as environment variable $HF_TOKEN
.
The script scripts/benchmark_one_datapoint.py
assumes that it was run from the directory where scripts
is, like this:
python scripts/benchmark_one_datapoint.py --help
Obtaining all datapoints for a single model
Run scripts/benchmark_one_model.py
.
Running the entire suite with Pegasus
You can use pegasus
to run the entire benchmark suite.
Queue and host files are in ./pegasus
.