Spaces:
Running
Running
# LLM Text Generation (Chat) | |
This benchmark suite benchmarks vLLM and TGI with the chat completion task with various models. | |
## Setup | |
### Docker images | |
You can pull vLLM and TGI Docker images with: | |
```sh | |
docker pull mlenergy/vllm:v0.4.2-openai | |
docker pull mlenergy/tgi:v2.0.2 | |
``` | |
### Installing Benchmark Script Dependencies | |
```sh | |
pip install -r requirements.txt | |
``` | |
### Starting the NVML container | |
Changing the power limit requires the `SYS_ADMIN` Linux security capability, which we delegate to a daemon Docker container running a base CUDA image. | |
```sh | |
bash ../../common/start_nvml_container.sh | |
``` | |
With the `nvml` container running, you can change power limit with something like `docker exec nvml nvidia-smi -i 0 -pl 200`. | |
### HuggingFace cache directory | |
The scripts assume the HuggingFace cache directory will be under `/data/leaderboard/hfcache` on the node that runs this benchmark. | |
## Benchmarking | |
### Obtaining one datapoint | |
Export your HuggingFace hub token as environment variable `$HF_TOKEN`. | |
The script `scripts/benchmark_one_datapoint.py` assumes that it was run from the directory where `scripts` is, like this: | |
```sh | |
python scripts/benchmark_one_datapoint.py --help | |
``` | |
### Obtaining all datapoints for a single model | |
Run `scripts/benchmark_one_model.py`. | |
### Running the entire suite with Pegasus | |
You can use [`pegasus`](https://github.com/jaywonchung/pegasus) to run the entire benchmark suite. | |
Queue and host files are in [`./pegasus`](./pegasus). | |