Jae-Won Chung
New leaderboard prototype
b10121d
|
raw
history blame
1.49 kB
# LLM Text Generation (Chat)
This benchmark suite benchmarks vLLM and TGI with the chat completion task with various models.
## Setup
### Docker images
You can pull vLLM and TGI Docker images with:
```sh
docker pull mlenergy/vllm:v0.4.2-openai
docker pull mlenergy/tgi:v2.0.2
```
### Installing Benchmark Script Dependencies
```sh
pip install -r requirements.txt
```
### Starting the NVML container
Changing the power limit requires the `SYS_ADMIN` Linux security capability, which we delegate to a daemon Docker container running a base CUDA image.
```sh
bash ../../common/start_nvml_container.sh
```
With the `nvml` container running, you can change power limit with something like `docker exec nvml nvidia-smi -i 0 -pl 200`.
### HuggingFace cache directory
The scripts assume the HuggingFace cache directory will be under `/data/leaderboard/hfcache` on the node that runs this benchmark.
## Benchmarking
### Obtaining one datapoint
Export your HuggingFace hub token as environment variable `$HF_TOKEN`.
The script `scripts/benchmark_one_datapoint.py` assumes that it was run from the directory where `scripts` is, like this:
```sh
python scripts/benchmark_one_datapoint.py --help
```
### Obtaining all datapoints for a single model
Run `scripts/benchmark_one_model.py`.
### Running the entire suite with Pegasus
You can use [`pegasus`](https://github.com/jaywonchung/pegasus) to run the entire benchmark suite.
Queue and host files are in [`./pegasus`](./pegasus).