Jae-Won Chung commited on
Commit
4e9ddf9
·
unverified ·
1 Parent(s): ce6d832

Benchmarking with Pegasus (#7)

Browse files
README.md CHANGED
@@ -33,6 +33,10 @@ $ docker run -it \
33
 
34
  ## Running the benchmark
35
 
 
 
 
 
36
  ```console
37
  # Inside the container
38
  $ cd /workspace/leaderboard
 
33
 
34
  ## Running the benchmark
35
 
36
+ We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
37
+
38
+ You can still run benchmarks without Pegasus like this:
39
+
40
  ```console
41
  # Inside the container
42
  $ cd /workspace/leaderboard
models.txt DELETED
@@ -1,20 +0,0 @@
1
- /data/leaderboard/weights/metaai/llama-7B
2
- /data/leaderboard/weights/metaai/llama-13B
3
- /data/leaderboard/weights/lmsys/vicuna-7B
4
- /data/leaderboard/weights/lmsys/vicuna-13B
5
- /data/leaderboard/weights/tatsu-lab/alpaca-7B
6
- /data/leaderboard/weights/BAIR/koala-7b
7
- /data/leaderboard/weights/BAIR/koala-13b
8
- /data/leaderboard/weights/BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth
9
- camel-ai/CAMEL-13B-Combined-Data
10
- databricks/dolly-v2-12b
11
- FreedomIntelligence/phoenix-inst-chat-7b
12
- h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
13
- lmsys/fastchat-t5-3b-v1.0
14
- Neutralzz/BiLLa-7B-SFT
15
- nomic-ai/gpt4all-13b-snoozy
16
- openaccess-ai-collective/manticore-13b-chat-pyg
17
- OpenAssistant/oasst-sft-1-pythia-12b
18
- project-baize/baize-v2-7B
19
- StabilityAI/stablelm-tuned-alpha-7b
20
- togethercomputer/RedPajama-INCITE-7B-Chat
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pegasus/README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Running benchmarks on multiple GPU nodes with Pegasus
2
+
3
+ [Pegasus](https://github.com/jaywonchung/pegasus) is an SSH-based multi-node command runner.
4
+ Different models have different verbosity, and benchmarking takes vastly different amounts of time.
5
+ Therefore, we want an automated piece of software that drains a queue of benchmarking jobs (one job per model) on a set of GPUs.
6
+
7
+ ## Setup
8
+
9
+ ### Install Pegasus
10
+
11
+ Pegasus needs to keep SSH connections with all the nodes in order to queue up and run jobs over SSH.
12
+ So you should install and run Pegasus on a computer that you can keep awake.
13
+
14
+ If you already have Rust set up:
15
+
16
+ ```console
17
+ $ cargo install pegasus-ssh
18
+ ```
19
+
20
+ Otherwise, you can set up Rust [here](https://www.rust-lang.org/tools/install), or just download Pegasus release binaries [here](https://github.com/jaywonchung/pegasus/releases/latest).
21
+
22
+ ### Necessary setup for each node
23
+
24
+ Every node must have two things:
25
+
26
+ 1. This repository cloned under `~/workspace/leaderboard`.
27
+ - If you want a different path, search and replace in `setup-nodes.yaml`.
28
+ 2. Model weights under `/data/leaderboard/weights`.
29
+ - If you want a different path, search and replace in `setup-nodes.yaml` and `benchmark.yaml`.
30
+
31
+ ### Specify node names for Pegasus
32
+
33
+ Modify `hosts.yaml` with nodes. See the file for an example.
34
+
35
+ - `hostname`: List the hostnames you would use in order to `ssh` into the node, e.g. `jaywonchung@gpunode01`.
36
+ - `gpu`: We want to create one Docker container for each GPU. List the indices of the GPUs you would like to use for the hosts.
37
+
38
+ ### Set up Docker containers on your nodes with Pegasus
39
+
40
+ This builds our Docker image and spawns one container per GPU (named `leaderboard%d`), for every node.
41
+
42
+ ```console
43
+ $ cd pegasus
44
+ $ cp setup-nodes.yaml queue.yaml
45
+ $ pegasus b
46
+ ```
47
+
48
+ `b` stands for broadcast. Every command is run once on all (`hostname`, `gpu`) combinations.
49
+
50
+ ## Benchmark
51
+
52
+ Now use Pegasus to run benchmarks for all the models across all nodes.
53
+
54
+ ```console
55
+ $ cd pegasus
56
+ $ cp benchmark.yaml queue.yaml
57
+ $ pegasus q
58
+ ```
59
+
60
+ `q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.
pegasus/benchmark.yaml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This YAML dictionary will expand into 20 (models) x 4 (tasks) = 80 job commands,
2
+ # where {{ model }} and {{ task }} are filled in with all possible combinations.
3
+ # {{ gpu }} is defined in `hosts.yaml`, and will be filled in when Pegasus
4
+ # determines the specific node and gpu the generated job command will run on.
5
+ - command:
6
+ - docker exec leaderboard{{ gpu }} python scripts/benchmark.py --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json --model-path {{ model }} --task {{ task }}
7
+ model:
8
+ - /data/leaderboard/weights/metaai/llama-7B
9
+ - /data/leaderboard/weights/metaai/llama-13B
10
+ - /data/leaderboard/weights/lmsys/vicuna-7B
11
+ - /data/leaderboard/weights/lmsys/vicuna-13B
12
+ - /data/leaderboard/weights/tatsu-lab/alpaca-7B
13
+ - /data/leaderboard/weights/BAIR/koala-7b
14
+ - /data/leaderboard/weights/BAIR/koala-13b
15
+ - /data/leaderboard/weights/BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth
16
+ - camel-ai/CAMEL-13B-Combined-Data
17
+ - databricks/dolly-v2-12b
18
+ - FreedomIntelligence/phoenix-inst-chat-7b
19
+ - h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
20
+ - lmsys/fastchat-t5-3b-v1.0
21
+ - Neutralzz/BiLLa-7B-SFT
22
+ - nomic-ai/gpt4all-13b-snoozy
23
+ - openaccess-ai-collective/manticore-13b-chat-pyg
24
+ - OpenAssistant/oasst-sft-1-pythia-12b
25
+ - project-baize/baize-v2-7B
26
+ - StabilityAI/stablelm-tuned-alpha-7b
27
+ - togethercomputer/RedPajama-INCITE-7B-Chat
28
+ task:
29
+ - chat
30
+ - chat-concise
31
+ - instruct
32
+ - instruct-concise
pegasus/hosts.yaml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example:
2
+ # Four 4-GPU nodes (node01 to node04), one container per GPU.
3
+ # node01 and node02 have four GPUs, and hence four containers.
4
+ # node03 and node04 have just two GPUs, and hence two containers.
5
+ # With this configuration, 2 * 4 + 2 * 2 = 12 jobs will run in parallel.
6
+ - hostname:
7
+ - node01
8
+ - node02
9
+ gpu:
10
+ - 0
11
+ - 1
12
+ - 2
13
+ - 3
14
+ - hostname:
15
+ - node03
16
+ - node04
17
+ gpu:
18
+ - 0
19
+ - 1
pegasus/setup-nodes.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ # The first item builds our docker image on each node once.
2
+ # The second item spawns one docker container per GPU.
3
+ # {{ gpu }} is defined in `hosts.yaml`, and will be filled in when Pegasus
4
+ # determines the specific node and gpu the generated job command will run on.
5
+ # We check {{ gpu }} = 0 to ensure that the image is only built once on each node.
6
+ - if [ {{ gpu }} = 0 ]; then cd workspace/leaderboard && docker build -t ml-energy:latest .; fi
7
+ - docker run -dit --name leaderboard{{ gpu }} --gpus '"device={{ gpu }}"' -v /data/leaderboard:/data/leaderboard -v $HOME/workspace/leaderboard:/workspace/leaderboard ml-energy:latest bash
scripts/benchmark.py CHANGED
@@ -19,21 +19,21 @@ from zeus.monitor import ZeusMonitor
19
  SYSTEM_PROMPTS = {
20
  "chat": (
21
  "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
22
- "The assistant gives helpful, detailed, and polite answers to the user's questions."
23
  ),
24
  "chat-concise": (
25
  "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
26
  "The assistant gives helpful, detailed, and polite answers to the user's questions. "
27
- "The assistnat's answers are concise but high-quality."
28
  ),
29
  "instruct": (
30
  "Below is an instruction that describes a task. "
31
- "Write a response that appropriately completes the request."
32
  ),
33
  "instruct-concise": (
34
  "Below is an instruction that describes a task. "
35
- "Write a response that appropriately completes the request."
36
- "The response should be concise but high-quality."
37
  ),
38
  }
39
 
 
19
  SYSTEM_PROMPTS = {
20
  "chat": (
21
  "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
22
+ "The assistant gives helpful, detailed, and polite answers to the user's questions. "
23
  ),
24
  "chat-concise": (
25
  "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
26
  "The assistant gives helpful, detailed, and polite answers to the user's questions. "
27
+ "The assistant's answers are very concise. "
28
  ),
29
  "instruct": (
30
  "Below is an instruction that describes a task. "
31
+ "Write a response that appropriately completes the request. "
32
  ),
33
  "instruct-concise": (
34
  "Below is an instruction that describes a task. "
35
+ "Write a response that appropriately completes the request. "
36
+ "The response should be very concise. "
37
  ),
38
  }
39