Jae-Won Chung commited on
Commit
360f81c
·
1 Parent(s): 4e9ddf9

Merge master and web

Browse files
.github/workflows/push_spaces.yaml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Deploy
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - master
7
+ paths:
8
+ - 'data/**'
9
+ - 'app.py'
10
+ - 'LEADERBOARD.md'
11
+ - 'README.md'
12
+ - 'requirements.txt'
13
+
14
+ concurrency:
15
+ group: ${{ github.ref }}-hfdeploy
16
+ cancel-in-progress: true
17
+
18
+ jobs:
19
+ push:
20
+ runs-on: ubuntu-latest
21
+ if: github.event.repository.fork == false
22
+ steps:
23
+ - name: Checkout repository
24
+ uses: actions/checkout@v3
25
+ with:
26
+ fetch-depth: 0
27
+ lfs: true
28
+ ref: master
29
+ - name: Push to Space
30
+ env:
31
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
32
+ run: |
33
+ for i in 1 2 3 4 5; do
34
+ git push -f https://jaywonchung:[email protected]/spaces/symbioticlab/ml-energy-leaderboard master:main && break || sleep 5;
35
+ done
Dockerfile CHANGED
@@ -32,7 +32,7 @@ RUN git clone https://github.com/SymbioticLab/Zeus.git zeus \
32
  # Install requirements for benchmarking
33
  ADD . /workspace/leaderboard
34
  RUN cd leaderboard \
35
- && pip install -r requirements.txt \
36
  && cd ..
37
 
38
  ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
 
32
  # Install requirements for benchmarking
33
  ADD . /workspace/leaderboard
34
  RUN cd leaderboard \
35
+ && pip install -r requirements-benchmark.txt \
36
  && cd ..
37
 
38
  ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
LEADERBOARD.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The goal of the ML.ENERGY Leaderboard is to give people a sense of how much **energy** LLMs would consume.
2
+
3
+ ## How is energy different?
4
+
5
+ Even between models with the exact same architecture and size, the average energy consumption per prompt is different because they have **different verbosity**.
6
+ That is, when asked the same thing, they answer in different lengths.
7
+
8
+ ## Metrics
9
+
10
+ - `gpu`: NVIDIA GPU model name
11
+ - `task`: Name of the task. See *Tasks* below for details.
12
+ - `throughput` (token/s): The average number of tokens generated per second.
13
+ - `response_length` (token): The average number of tokens in the model's response.
14
+ - `latency` (s): The average time it took for the model to generate a response.
15
+ - `energy` (J): The average energy consumed by the model to generate a response.
16
+
17
+ ## Tasks
18
+
19
+ For each task, every model uses the same system prompt. We still account for differences in roles, e.g. `USER`, `HUMAN`, `ASSISTANT`, `GPT`.
20
+
21
+ | Name | System prompt |
22
+ |--|--|
23
+ | chat | A chat between a human user (prompter) and an artificial intelligence (AI) assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. |
24
+ | chat-concise | A chat between a human user (prompter) and an artificial intelligence (AI) assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant's answers are very concise. |
25
+ | instruct | Below is an instruction that describes a task. Write a response that appropriately completes the request. |
26
+ | instruct-concise | Below is an instruction that describes a task. Write a response that appropriately completes the request. The response should be very concise. |
27
+
28
+ ## Setup
29
+
30
+ Find our benchmark script for one model [here](https://github.com/ml-energy/leaderboard/blob/master/benchmark.py).
31
+
32
+ ### Software
33
+
34
+ - PyTorch 2.0.1
35
+ - [FastChat](https://github.com/lm-sys/fastchat) -- For various model support
36
+ - [Zeus](https://ml.energy/zeus) -- For GPU energy measurement
37
+
38
+ ### Hardware
39
+
40
+ - NVIDIA A40 GPU
41
+
42
+ ### Parameters
43
+
44
+ - Model
45
+ - Batch size 1
46
+ - FP16
47
+ - Sampling (decoding)
48
+ - Greedy sampling from multinomial distribution
49
+ - Temperature 0.7
50
+ - Repetition penalty 1.0
51
+
52
+ ## Data
53
+
54
+ We randomly sampled around 3000 prompts from the [cleaned ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
55
+ See [here](https://github.com/ml-energy/leaderboard/tree/master/sharegpt) for more detail on how we created the benchmark dataset.
56
+
57
+ We used identical system prompts for all models (while respecting their own *role* tokens):
58
+ ```
59
+ A chat between a human user (prompter) and an artificial intelligence (AI) assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
60
+ ```
61
+
62
+ ## Upcoming
63
+
64
+ - Compare against more optimized inference runtimes, like TensorRT.
65
+ - Other GPUs
66
+ - Other model/sampling parameters
67
+ - More models
68
+ - Model quality evaluation numbers (e.g., AI2 Reasoning Challenge, HellaSwag)
README.md CHANGED
@@ -1,3 +1,13 @@
 
 
 
 
 
 
 
 
 
 
1
  # ML.ENERGY Leaderboard
2
 
3
  [![Leaderboard](https://custom-icon-badges.herokuapp.com/badge/ML.ENERGY-Leaderboard-blue.svg?logo=ml-energy)](https://ml.energy/leaderboard)
 
1
+ ---
2
+ title: "ML.ENERGY Leaderboard"
3
+ python_version: "3.9"
4
+ app_file: "app.py"
5
+ sdk: "gradio"
6
+ sdk_version: "3.35.2"
7
+ pinned: true
8
+ tags: ["energy", "leaderboard"]
9
+ ---
10
+
11
  # ML.ENERGY Leaderboard
12
 
13
  [![Leaderboard](https://custom-icon-badges.herokuapp.com/badge/ML.ENERGY-Leaderboard-blue.svg?logo=ml-energy)](https://ml.energy/leaderboard)
app.py ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import os
4
+ import json
5
+ import yaml
6
+ import itertools
7
+ import contextlib
8
+
9
+ import numpy as np
10
+ import gradio as gr
11
+ import pandas as pd
12
+ import plotly.io as pio
13
+ import plotly.express as px
14
+ pio.templates.default = "plotly_white"
15
+
16
+
17
+ class TableManager:
18
+ def __init__(self, data_dir: str) -> None:
19
+ """Load leaderboard data from CSV files in data_dir."""
20
+ # Load and merge CSV files.
21
+ df = self._read_tables(data_dir)
22
+ models = json.load(open(f"{data_dir}/models.json"))
23
+
24
+ # Add the #params column.
25
+ df["parameters"] = df["model"].apply(lambda x: models[x]["params"])
26
+
27
+ # Make the first column (model) an HTML anchor to the model's website.
28
+ def format_model_link(model_name: str) -> str:
29
+ url = models[model_name]["url"]
30
+ nickname = models[model_name]["nickname"]
31
+ return (
32
+ f'<a style="text-decoration: underline; text-decoration-style: dotted" '
33
+ f'target="_blank" href="{url}">{nickname}</a>'
34
+ )
35
+ df["model"] = df["model"].apply(format_model_link)
36
+
37
+ # Sort by energy.
38
+ df = df.sort_values(by="energy", ascending=True)
39
+
40
+ # The full table where all the data are.
41
+ self.full_df = df
42
+ # The currently visible table after filtering.
43
+ self.cur_df = df
44
+ # The current index of the visible table after filtering.
45
+ self.cur_index = df.index.to_numpy()
46
+
47
+ def _read_tables(self, data_dir: str) -> pd.DataFrame:
48
+ """Read tables."""
49
+ df_score = pd.read_csv(f"{data_dir}/score.csv")
50
+
51
+ with open(f"{data_dir}/schema.yaml") as file:
52
+ self.schema: dict[str, list] = yaml.safe_load(file)
53
+
54
+ res_df = pd.DataFrame()
55
+
56
+ # Do a cartesian product of all the choices in the schema
57
+ # and try to read the corresponding CSV files.
58
+ for choice in itertools.product(*self.schema.values()):
59
+ filepath = f"{data_dir}/{'_'.join(choice)}_benchmark.csv"
60
+ with contextlib.suppress(FileNotFoundError):
61
+ df = pd.read_csv(filepath)
62
+ for key, val in zip(self.schema.keys(), choice):
63
+ df.insert(1, key, val)
64
+ res_df = pd.concat([res_df, df])
65
+
66
+ if res_df.empty:
67
+ raise ValueError(f"No benchmark CSV files were read from {data_dir=}.")
68
+
69
+ return pd.merge(res_df, df_score, on=["model"]).round(2)
70
+
71
+ def _format_msg(self, text: str) -> str:
72
+ """Formats into HTML that prints in Monospace font."""
73
+ return f"<pre style='font-family: monospace'>{text}</pre>"
74
+
75
+ def add_column(self, column_name: str, formula: str):
76
+ """Create and add a new column with the given formula."""
77
+ # If the user did not provide the name of the new column,
78
+ # generate a unique name for them.
79
+ if not column_name:
80
+ counter = 1
81
+ while (column_name := f"custom{counter}") in self.full_df.columns:
82
+ counter += 1
83
+
84
+ # If the user did not provide a formula, return an error message.
85
+ if not formula:
86
+ return self.cur_df, self._format_msg("Please enter a formula.")
87
+
88
+ # If there is an equal sign in the formula, `df.eval` will
89
+ # return an entire DataFrame with the new column, instead of
90
+ # just the new column. This is not what we want, so we check
91
+ # for this case and return an error message.
92
+ if "=" in formula:
93
+ return self.cur_df, self._format_msg("Invalid formula: expr cannot contain '='.")
94
+
95
+ # The user may want to update an existing column.
96
+ verb = "Updated" if column_name in self.full_df.columns else "Added"
97
+
98
+ # Evaluate the formula and catch any error.
99
+ try:
100
+ col = self.full_df.eval(formula)
101
+ if isinstance(col, pd.Series):
102
+ col = col.round(2)
103
+ self.full_df[column_name] = col
104
+ except Exception as exc:
105
+ return self.cur_df, self._format_msg(f"Invalid formula: {exc}")
106
+
107
+ # If adding a column succeeded, `self.cur_df` should also be updated.
108
+ self.cur_df = self.full_df.loc[self.cur_index]
109
+ return self.cur_df, self._format_msg(f"{verb} column '{column_name}'.")
110
+
111
+ def get_dropdown(self):
112
+ columns = self.full_df.columns.tolist()[1:] # include gpu and task in the dropdown
113
+ return [
114
+ gr.Dropdown(choices=columns, label="X"),
115
+ gr.Dropdown(choices=columns, label="Y"),
116
+ gr.Dropdown(choices=columns, label="Z (optional)"),
117
+ ]
118
+
119
+ def update_dropdown(self):
120
+ columns = self.full_df.columns.tolist()[1:]
121
+ dropdown_update = gr.Dropdown.update(choices=columns)
122
+ return [dropdown_update] * 3
123
+
124
+ def set_filter_get_df(self, *filters):
125
+ """Set the current set of filters and return the filtered DataFrame."""
126
+ index = np.full(len(self.full_df), True)
127
+ for setup, choice in zip(self.schema, filters):
128
+ index = index & self.full_df[setup].isin(choice)
129
+ self.cur_df = self.full_df.loc[index]
130
+ self.cur_index = index
131
+ return self.cur_df
132
+
133
+ def plot_scatter(self, width, height, x, y, z):
134
+ # The user did not select either x or y.
135
+ if not x or not y:
136
+ return None, width, height, self._format_msg("Please select both X and Y.")
137
+
138
+ # Width and height may be an empty string. Then we set them to 600.
139
+ if not width and not height:
140
+ width, height = "600", "600"
141
+ elif not width:
142
+ width = height
143
+ elif not height:
144
+ height = width
145
+ try:
146
+ width, height = int(width), int(height)
147
+ except ValueError:
148
+ return None, width, height, self._format_msg("Width and height should be positive integers.")
149
+
150
+ # Strip the <a> tag from model names.
151
+ text = self.cur_df["model"].apply(lambda x: x.split(">")[1].split("<")[0])
152
+ if z is None or z == "None" or z == "":
153
+ fig = px.scatter(self.cur_df, x=x, y=y, text=text)
154
+ else:
155
+ fig = px.scatter_3d(self.cur_df, x=x, y=y, z=z, text=text)
156
+ fig.update_traces(textposition="top center")
157
+ fig.update_layout(width=width, height=height)
158
+
159
+ return fig, width, height, ""
160
+
161
+
162
+ # Find the latest version of the CSV files in data/
163
+ # and initialize the global TableManager.
164
+ latest_date = sorted(os.listdir("data/"))[-1]
165
+
166
+ # The global instance of the TableManager should only be used when
167
+ # initializing components in the Gradio interface. If the global instance
168
+ # is mutated while handling user sessions, the change will be reflected
169
+ # in every user session. Instead, the instance provided by gr.State should
170
+ # be used.
171
+ global_tbm = TableManager(f"data/{latest_date}")
172
+
173
+ # Custom JS.
174
+ # XXX: This is a hack to make the model names clickable.
175
+ # Ideally, we should set `datatype` in the constructor of `gr.DataFrame` to
176
+ # `["markdown"] + ["number"] * (len(df.columns) - 1)` and format models names
177
+ # as an HTML <a> tag. However, because we also want to dynamically add new
178
+ # columns to the table and Gradio < 4.0 does not support updating `datatype` with
179
+ # `gr.DataFrame.update` yet, we need to manually walk into the DOM and replace
180
+ # the innerHTML of the model name cells with dynamically interpreted HTML.
181
+ # Desired feature tracked at https://github.com/gradio-app/gradio/issues/3732
182
+ dataframe_update_js = f"""
183
+ function format_model_link() {{
184
+ // Iterate over the cells of the first column of the leaderboard table.
185
+ for (let index = 1; index <= {len(global_tbm.full_df)}; index++) {{
186
+ // Get the cell.
187
+ var cell = document.querySelector(
188
+ `#tab-leaderboard > div > div > div > table > tbody > tr:nth-child(${{index}}) > td:nth-child(1) > div > span`
189
+ );
190
+
191
+ // If nothing was found, it likely means that now the visible table has less rows
192
+ // than the full table. This happens when the user filters the table. In this case,
193
+ // we should just return.
194
+ if (cell == null) break;
195
+
196
+ // This check exists to make this function idempotent.
197
+ // Multiple changes to the Dataframe component may invoke this function,
198
+ // multiple times to the same HTML table (e.g., adding and sorting cols).
199
+ // Thus, we check whether we already formatted the model names by seeing
200
+ // whether the child of the cell is a text node. If it is not,
201
+ // it means we already parsed it into HTML, so we should just return.
202
+ if (cell.firstChild.nodeType != 3) break;
203
+
204
+ // Decode and interpret the innerHTML of the cell as HTML.
205
+ var decoded_string = new DOMParser().parseFromString(cell.innerHTML, "text/html").documentElement.textContent;
206
+ var temp = document.createElement("template");
207
+ temp.innerHTML = decoded_string;
208
+ var model_anchor = temp.content.firstChild;
209
+
210
+ // Replace the innerHTML of the cell with the interpreted HTML.
211
+ cell.replaceChildren(model_anchor);
212
+ }}
213
+
214
+ // Return all arguments as is.
215
+ return arguments
216
+ }}
217
+ """
218
+
219
+ # Custom CSS.
220
+ css = """
221
+ /* Make ML.ENERGY look like a clickable logo. */
222
+ .text-logo {
223
+ color: #27cb63 !important;
224
+ text-decoration: none !important;
225
+ }
226
+
227
+ /* Make the submit button the same color as the logo. */
228
+ .btn-submit {
229
+ background: #27cb63 !important;
230
+ color: white !important;
231
+ border: 0 !important;
232
+ }
233
+
234
+ /* Center the plotly plot inside its container. */
235
+ .plotly > div {
236
+ margin: auto !important;
237
+ }
238
+
239
+ /* Limit the width of the first column to 300 px. */
240
+ table td:first-child,
241
+ table th:first-child {
242
+ max-width: 300px;
243
+ overflow: auto;
244
+ white-space: nowrap;
245
+ }
246
+ """
247
+
248
+ block = gr.Blocks(css=css)
249
+ with block:
250
+ tbm = gr.State(global_tbm) # type: ignore
251
+ gr.HTML("<h1><a href='https://ml.energy' class='text-logo'>ML.ENERGY</a> Leaderboard</h1>")
252
+
253
+ with gr.Tabs():
254
+ # Tab 1: Leaderboard.
255
+ with gr.TabItem("Leaderboard"):
256
+ with gr.Row():
257
+ with gr.Box():
258
+ gr.Markdown("## Select benchmark parameters")
259
+ checkboxes = []
260
+ for key, choices in global_tbm.schema.items():
261
+ # Specifying `value` makes everything checked by default.
262
+ checkboxes.append(gr.CheckboxGroup(choices=choices, value=choices, label=key))
263
+
264
+ # Block 1: Leaderboard table.
265
+ with gr.Row():
266
+ dataframe = gr.Dataframe(type="pandas", elem_id="tab-leaderboard")
267
+ # Make sure the models have clickable links.
268
+ dataframe.change(None, None, None, _js=dataframe_update_js)
269
+ # Table automatically updates when users check or uncheck any checkbox.
270
+ for checkbox in checkboxes:
271
+ checkbox.change(TableManager.set_filter_get_df, inputs=[tbm, *checkboxes], outputs=dataframe)
272
+
273
+ # Block 2: Allow users to add new columns.
274
+ with gr.Row():
275
+ with gr.Column(scale=3):
276
+ with gr.Row():
277
+ colname_input = gr.Textbox("power", lines=1, label="Custom column name")
278
+ formula_input = gr.Textbox("energy/latency", lines=1, label="Formula")
279
+ with gr.Column(scale=1):
280
+ with gr.Row():
281
+ add_col_btn = gr.Button("Add to table (⏎)", elem_classes=["btn-submit"])
282
+ with gr.Row():
283
+ clear_input_btn = gr.Button("Clear")
284
+ with gr.Row():
285
+ add_col_message = gr.HTML("")
286
+ colname_input.submit(
287
+ TableManager.add_column,
288
+ inputs=[tbm, colname_input, formula_input],
289
+ outputs=[dataframe, add_col_message],
290
+ )
291
+ formula_input.submit(
292
+ TableManager.add_column,
293
+ inputs=[tbm, colname_input, formula_input],
294
+ outputs=[dataframe, add_col_message],
295
+ )
296
+ add_col_btn.click(
297
+ TableManager.add_column,
298
+ inputs=[tbm, colname_input, formula_input],
299
+ outputs=[dataframe, add_col_message],
300
+ )
301
+ clear_input_btn.click(
302
+ lambda: (None, None, None),
303
+ inputs=None,
304
+ outputs=[colname_input, formula_input, add_col_message],
305
+ )
306
+
307
+ # Block 3: Allow users to plot 2D and 3D scatter plots.
308
+ with gr.Row():
309
+ with gr.Column(scale=3):
310
+ with gr.Row():
311
+ # Initialize the dropdown choices with the global TableManager with just the original columns.
312
+ axis_dropdowns = global_tbm.get_dropdown()
313
+ with gr.Column(scale=1):
314
+ with gr.Row():
315
+ plot_btn = gr.Button("Plot", elem_classes=["btn-submit"])
316
+ with gr.Row():
317
+ clear_plot_btn = gr.Button("Clear")
318
+ with gr.Accordion("Plot size (600 x 600 by default)", open=False):
319
+ with gr.Row():
320
+ plot_width_input = gr.Textbox("600", lines=1, label="Width (px)")
321
+ plot_height_input = gr.Textbox("600", lines=1, label="Height (px)")
322
+ with gr.Row():
323
+ plot = gr.Plot()
324
+ with gr.Row():
325
+ plot_message = gr.HTML("")
326
+ add_col_btn.click(TableManager.update_dropdown, inputs=tbm, outputs=axis_dropdowns) # type: ignore
327
+ plot_width_input.submit(
328
+ TableManager.plot_scatter,
329
+ inputs=[tbm, plot_width_input, plot_height_input, *axis_dropdowns],
330
+ outputs=[plot, plot_width_input, plot_height_input, plot_message],
331
+ )
332
+ plot_height_input.submit(
333
+ TableManager.plot_scatter,
334
+ inputs=[tbm, plot_width_input, plot_height_input, *axis_dropdowns],
335
+ outputs=[plot, plot_width_input, plot_height_input, plot_message],
336
+ )
337
+ plot_btn.click(
338
+ TableManager.plot_scatter,
339
+ inputs=[tbm, plot_width_input, plot_height_input, *axis_dropdowns],
340
+ outputs=[plot, plot_width_input, plot_height_input, plot_message],
341
+ )
342
+ clear_plot_btn.click(
343
+ lambda: (None,) * 7,
344
+ None,
345
+ outputs=[*axis_dropdowns, plot, plot_width_input, plot_height_input, plot_message],
346
+ )
347
+
348
+ # Block 4: Leaderboard date.
349
+ with gr.Row():
350
+ gr.HTML(f"<h3 style='color: gray'>Date: {latest_date}</h3>")
351
+
352
+ # Tab 2: About page.
353
+ with gr.TabItem("About"):
354
+ # Read in LEADERBOARD.md
355
+ gr.Markdown(open("LEADERBOARD.md").read())
356
+
357
+ # Load the table on page load.
358
+ block.load(lambda tbm: tbm.full_df, inputs=tbm, outputs=dataframe)
359
+
360
+ block.launch()
data/2023-06-17/A40_chat-concise_benchmark.csv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model,throughput,response_length,latency,energy
2
+ lmsys/vicuna-7B,29.829026631206286,271.9100067159167,9.132098360456242,2143.52561215591
3
+ StabilityAI/stablelm-tuned-alpha-7b,26.57393014131587,255.54365345869712,9.397491535841587,2439.293790463411
4
+ databricks/dolly-v2-12b,15.273711386107992,141.4445936870383,8.891394107289802,2095.3922259908463
5
+ tatsu-lab/alpaca-7B,29.852723929321897,121.29281396910679,4.0361613936664735,1080.58840899923
6
+ camel-ai/CAMEL-13B-Combined-Data,17.460563027552272,283.4543317662861,16.28615281731328,4262.537929482914
7
+ BAIR/koala-7b,29.888796737616882,251.45399597044997,8.385359924898546,1940.5972622565405
8
+ h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2,29.57083939186151,212.1212222968435,7.140702722639097,1524.9156188717134
9
+ lmsys/vicuna-13B,17.282398545945153,269.62491605104094,15.6926107484118,4203.671348891897
10
+ togethercomputer/RedPajama-INCITE-7B-Chat,14.451529449594792,275.07991940899933,18.294811550193185,2937.839604096736
11
+ metaai/llama-13B,15.493654667854246,81.26796507723304,4.881042191492302,1264.815973472223
12
+ BAIR/koala-13b,17.393931641363825,252.56816655473472,14.499323956849762,3747.8785020146615
13
+ nomic-ai/gpt4all-13b-snoozy,17.45953616124214,217.35325721961047,12.440528350608277,3263.628521155058
14
+ BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth,32.86455306586691,235.19274680993956,6.718876629108356,1661.2857568838062
15
+ lmsys/fastchat-t5-3b-v1.0,21.09615171109894,313.09905977165886,18.366778339359637,1807.6800728676938
16
+ project-baize/baize-v2-7B,28.92598212176896,321.06010745466756,10.940218308832323,2644.9160527197046
17
+ OpenAssistant/oasst-sft-1-pythia-12b,16.01484723680571,249.1007387508395,15.153340834740217,3829.1071417058643
18
+ metaai/llama-7B,25.80475014752762,63.463734049697784,2.2525196486312047,539.0479066487654
19
+ Neutralzz/BiLLa-7B-SFT,29.382300021941255,141.6155137676293,4.84122748247456,1131.9990564138398
20
+ openaccess-ai-collective/manticore-13b-chat-pyg,17.220798012743607,268.91269308260576,15.692034786355059,4051.8244570182064
21
+ FreedomIntelligence/phoenix-inst-chat-7b,32.33242374435414,229.95869711215582,6.910495058340042,2049.7076356614534
data/2023-06-17/A40_chat_benchmark.csv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model,throughput,response_length,latency,energy
2
+ lmsys/vicuna-7B,30.071255281840546,284.27266621893887,9.474226236086707,2239.068904633984
3
+ lmsys/vicuna-13B,17.50774908972375,281.298522498321,16.096842177613397,4265.287245130957
4
+ tatsu-lab/alpaca-7B,30.09713731797294,125.20013431833445,4.129986896187982,916.045386501007
5
+ metaai/llama-7B,25.768609507174105,64.59032907991941,2.284814629996714,525.7081235728675
6
+ metaai/llama-13B,15.699146010424393,80.32236400268637,4.757332595030835,1293.689832437891
7
+ camel-ai/CAMEL-13B-Combined-Data,17.40620018812374,292.3438549361988,16.834190191676036,4466.796722968406
8
+ BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth,33.10830960148045,243.21793149764943,6.9481068778416555,1833.7241615177682
9
+ databricks/dolly-v2-12b,15.597444626791148,148.3270651443922,9.168758730287117,2362.087664204047
10
+ FreedomIntelligence/phoenix-inst-chat-7b,32.663340053939855,243.14909335124244,7.271332307256473,2149.2483156478947
11
+ h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2,28.851651162429675,216.66286098052385,7.544740398256815,1636.1981326393268
12
+ lmsys/fastchat-t5-3b-v1.0,21.694893194918894,312.84116856950976,17.951570049172194,1787.5060366017656
13
+ Neutralzz/BiLLa-7B-SFT,29.49201862368961,159.29986568166555,5.443799112468728,1218.644757555166
14
+ nomic-ai/gpt4all-13b-snoozy,17.46230398293782,250.1742780389523,14.322901371942146,4093.901904969787
15
+ openaccess-ai-collective/manticore-13b-chat-pyg,17.485883513135143,289.58697112155807,16.594830177599892,4316.488665547325
16
+ OpenAssistant/oasst-sft-1-pythia-12b,16.056643548610985,254.26259234385495,15.462307354021265,3891.8823989257867
17
+ project-baize/baize-v2-7B,29.004360284420006,324.24546675621224,11.011670755046683,2621.3502615853154
18
+ BAIR/koala-7b,29.723806931945834,260.7196104768301,8.720630589929986,2017.3295624580246
19
+ BAIR/koala-13b,17.451436035057224,262.5295500335796,15.030911340299886,3827.6102800537265
20
+ StabilityAI/stablelm-tuned-alpha-7b,26.413142361637988,255.34687709872398,9.454673889303727,2319.91146675621
21
+ togethercomputer/RedPajama-INCITE-7B-Chat,21.410571862447824,279.5094022834117,12.506414288534286,2541.441298522497
data/2023-06-17/A40_instruct-concise_benchmark.csv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model,throughput,response_length,latency,energy
2
+ openaccess-ai-collective/manticore-13b-chat-pyg,17.4993855646115,229.5795836131632,13.132503049058466,3501.182491605137
3
+ lmsys/vicuna-7B,29.046593546528904,212.760241773002,7.3296203452423265,1706.1712568838818
4
+ BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth,33.07481929862108,242.74177300201478,6.943508281124177,1787.5870651446628
5
+ tatsu-lab/alpaca-7B,29.475397017987323,117.95399597044997,3.9749026008906023,927.7634805239352
6
+ metaai/llama-13B,15.786345111463364,102.35762256548018,6.009854743435423,1590.8409496307413
7
+ OpenAssistant/oasst-sft-1-pythia-12b,16.03459728484094,241.31732706514438,14.686811677200872,3673.6327222969167
8
+ BAIR/koala-7b,29.75360600658546,200.51544660846204,6.658448294727219,1426.3522693082625
9
+ databricks/dolly-v2-12b,15.330621441395213,149.7411014103425,9.354867176525545,2240.8669214236447
10
+ togethercomputer/RedPajama-INCITE-7B-Chat,20.849945627496787,275.1017461383479,12.625316554842463,2521.5761004031265
11
+ BAIR/koala-13b,17.3129078938621,185.43653458697113,10.592529783475095,3058.9324654130387
12
+ metaai/llama-7B,26.263780879593444,96.94089993284084,3.35252434620665,871.9958969106643
13
+ lmsys/vicuna-13B,17.391771626218006,199.80960376091338,11.55835909586615,3031.3180846202845
14
+ camel-ai/CAMEL-13B-Combined-Data,17.27426964902266,194.22028206850234,11.174119858559074,2956.610076225849
15
+ FreedomIntelligence/phoenix-inst-chat-7b,32.63089093979437,197.44492948287441,5.895621189552368,1736.5409079919812
16
+ StabilityAI/stablelm-tuned-alpha-7b,26.523547972773766,244.16588314304903,8.984752658064695,2305.969679315211
17
+ h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2,29.926231643904146,223.85426460711886,7.510739674653152,1676.2203126259078
18
+ Neutralzz/BiLLa-7B-SFT,29.118626503392385,104.97817327065144,3.5443721553023035,818.274197783844
19
+ nomic-ai/gpt4all-13b-snoozy,17.423064750595767,135.3938885157824,7.734149922101941,1871.6546057756862
20
+ project-baize/baize-v2-7B,28.13796712305154,262.9902619207522,9.250474432119292,2105.324460711873
21
+ lmsys/fastchat-t5-3b-v1.0,40.20822673632634,281.74110141034254,10.492163513616964,1110.3276249158694
data/2023-06-17/A40_instruct_benchmark.csv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model,throughput,response_length,latency,energy
2
+ FreedomIntelligence/phoenix-inst-chat-7b,32.795664087070854,221.2484889187374,6.588933942256567,1863.514234721291
3
+ tatsu-lab/alpaca-7B,30.107577299286163,126.36030893216925,4.161682809197595,973.6026363331109
4
+ togethercomputer/RedPajama-INCITE-7B-Chat,17.009700321585225,282.3190060443251,15.98330062659441,2834.287281396827
5
+ lmsys/vicuna-7B,29.417977025894693,267.841840161182,9.164755312435684,2131.3740241775145
6
+ BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth,33.80171881884355,264.9563465413029,7.560664676534496,2049.7698284082962
7
+ databricks/dolly-v2-12b,15.67950302952103,155.61316319677636,9.582122375200395,2369.283402619204
8
+ camel-ai/CAMEL-13B-Combined-Data,17.522554791478672,245.7824042981867,14.081241566503387,3646.9116689053053
9
+ BAIR/koala-7b,29.350583449996343,253.7239758226998,8.64835721658589,1918.897159502941
10
+ openaccess-ai-collective/manticore-13b-chat-pyg,17.267666593018745,276.03559435862996,16.03621509688224,4113.539149429272
11
+ OpenAssistant/oasst-sft-1-pythia-12b,14.438332974888972,253.76796507723304,17.249506058312747,3936.226839825601
12
+ h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2,29.207313503768304,233.47951645399598,8.096591130254916,1804.4821860309694
13
+ project-baize/baize-v2-7B,28.47516339265779,306.76561450638013,10.688193155492014,2415.801835795991
14
+ StabilityAI/stablelm-tuned-alpha-7b,23.120196395716167,244.85930154466084,10.369934308136857,2445.444213566122
15
+ lmsys/vicuna-13B,17.595105665816288,263.95567494963063,15.050040050311223,3967.4957498321023
16
+ Neutralzz/BiLLa-7B-SFT,28.937231313361377,142.33848220282067,4.7632941655637016,1177.3565590999485
17
+ metaai/llama-13B,15.747651109641996,101.69375419744796,5.970782866386873,1693.432888515849
18
+ lmsys/fastchat-t5-3b-v1.0,31.014371537480102,357.13734049697786,17.964342393854206,1758.7082199462513
19
+ nomic-ai/gpt4all-13b-snoozy,17.558360268154225,232.67461383478846,13.290953806575821,3411.2449123573792
20
+ BAIR/koala-13b,17.468010116614902,254.08529214237743,14.4913390549458,3858.416870718604
21
+ metaai/llama-7B,26.40244189851013,104.19308260577569,3.608983782098236,864.4181752854275
data/2023-06-17/models.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lmsys/vicuna-7B": {
3
+ "url": "https://lmsys.org/blog/2023-03-30-vicuna/",
4
+ "nickname": "LMSys/vicuna-7B",
5
+ "params": 7
6
+ },
7
+ "lmsys/vicuna-13B": {
8
+ "url": "https://lmsys.org/blog/2023-03-30-vicuna/",
9
+ "nickname": "LMSys/vicuna-13B",
10
+ "params": 13
11
+ },
12
+ "tatsu-lab/alpaca-7B": {
13
+ "url": "https://huggingface.co/tatsu-lab/alpaca-7b-wdiff",
14
+ "nickname": "tatsu-lab/alpaca-7B",
15
+ "params": 7
16
+ },
17
+ "metaai/llama-7B": {
18
+ "url": "https://github.com/facebookresearch/llama",
19
+ "nickname": "MetaAI/LLaMA-7B",
20
+ "params": 7
21
+ },
22
+ "metaai/llama-13B": {
23
+ "url": "https://github.com/facebookresearch/llama",
24
+ "nickname": "MetaAI/LLaMA-13B",
25
+ "params": 13
26
+ },
27
+ "camel-ai/CAMEL-13B-Combined-Data": {
28
+ "url": "https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data",
29
+ "nickname": "Camel-AI/CAMEL-13B-Combined-Data",
30
+ "params": 13
31
+ },
32
+ "BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth": {
33
+ "url": "https://huggingface.co/BlinkDL/rwkv-4-raven",
34
+ "nickname": "BlinkDL/RWKV-4-Raven-7B",
35
+ "params": 7
36
+ },
37
+ "databricks/dolly-v2-12b": {
38
+ "url": "https://huggingface.co/databricks/dolly-v2-12b",
39
+ "nickname": "databricks/dolly-v2-12B",
40
+ "params": 12
41
+ },
42
+ "FreedomIntelligence/phoenix-inst-chat-7b": {
43
+ "url": "https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b",
44
+ "nickname": "FreedomIntelligence/phoenix-inst-chat-7b",
45
+ "params": 7
46
+ },
47
+ "h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2": {
48
+ "url": "https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2",
49
+ "nickname": "H2OAI/H2OGPT-oasst1-7B",
50
+ "params": 7
51
+ },
52
+ "lmsys/fastchat-t5-3b-v1.0": {
53
+ "url": "https://huggingface.co/lmsys/fastchat-t5-3b-v1.0",
54
+ "nickname": "LMSys/fastchat-t5-3b-v1.0",
55
+ "params": 3
56
+ },
57
+ "Neutralzz/BiLLa-7B-SFT": {
58
+ "url": "https://huggingface.co/Neutralzz/BiLLa-7B-SFT",
59
+ "nickname": "Neutralzz/BiLLa-7B-SFT",
60
+ "params": 7
61
+ },
62
+ "nomic-ai/gpt4all-13b-snoozy": {
63
+ "url": "https://huggingface.co/nomic-ai/gpt4all-13b-snoozy",
64
+ "nickname": "nomic-ai/gpt4all-13b-snoozy",
65
+ "params": 13
66
+ },
67
+ "openaccess-ai-collective/manticore-13b-chat-pyg": {
68
+ "url": "https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg",
69
+ "nickname": "openaccess-ai-collective/manticore-13b-chat-pyg",
70
+ "params": 13
71
+ },
72
+ "OpenAssistant/oasst-sft-1-pythia-12b": {
73
+ "url": "https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b",
74
+ "nickname": "OpenAssistant/oasst-sft-1-pythia-12b",
75
+ "params": 12
76
+ },
77
+ "project-baize/baize-v2-7B": {
78
+ "url": "https://huggingface.co/project-baize/baize-v2-7B",
79
+ "nickname": "project-baize/baize-v2-7B",
80
+ "params": 7
81
+ },
82
+ "BAIR/koala-7b": {
83
+ "url": "https://bair.berkeley.edu/blog/2023/04/03/koala/",
84
+ "nickname": "BAIR/koala-7b",
85
+ "params": 7
86
+ },
87
+ "BAIR/koala-13b": {
88
+ "url": "https://bair.berkeley.edu/blog/2023/04/03/koala/",
89
+ "nickname": "BAIR/koala-13b",
90
+ "params": 13
91
+ },
92
+ "StabilityAI/stablelm-tuned-alpha-7b": {
93
+ "url": "https://huggingface.co/StabilityAI/stablelm-tuned-alpha-7b",
94
+ "nickname": "StabilityAI/stablelm-tuned-alpha-7b",
95
+ "params": 7
96
+ },
97
+ "togethercomputer/RedPajama-INCITE-7B-Chat": {
98
+ "url": "https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat",
99
+ "nickname": "togethercomputer/RedPajama-INCITE-7B-Chat",
100
+ "params": 7
101
+ }
102
+ }
data/2023-06-17/schema.yaml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gpu: ["A40"]
2
+ task: ["chat", "chat-concise", "instruct", "instruct-concise"]
data/2023-06-17/score.csv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model,lmsys_elo
2
+ lmsys/vicuna-7B,1007
3
+ lmsys/vicuna-13B,1054
4
+ tatsu-lab/alpaca-7B,NaN
5
+ metaai/llama-7B,NaN
6
+ metaai/llama-13B,854
7
+ camel-ai/CAMEL-13B-Combined-Data,NaN
8
+ BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth,NaN
9
+ databricks/dolly-v2-12b,866
10
+ FreedomIntelligence/phoenix-inst-chat-7b,NaN
11
+ h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2,NaN
12
+ lmsys/fastchat-t5-3b-v1.0,941
13
+ Neutralzz/BiLLa-7B-SFT,NaN
14
+ nomic-ai/gpt4all-13b-snoozy,NaN
15
+ openaccess-ai-collective/manticore-13b-chat-pyg,NaN
16
+ OpenAssistant/oasst-sft-1-pythia-12b,921
17
+ project-baize/baize-v2-7B,NaN
18
+ BAIR/koala-7b,NaN
19
+ BAIR/koala-13b,980
20
+ StabilityAI/stablelm-tuned-alpha-7b,882
21
+ togethercomputer/RedPajama-INCITE-7B-Chat,NaN
index.html ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <title>ML.ENERGY Leaderboard</title>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1">
7
+ <script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/3.23.0/gradio.js"></script>
8
+ </head>
9
+ <body>
10
+ <gradio-app src="https://symbioticlab-ml-energy-leaderboard.hf.space?__theme=light"></gradio-app>
11
+ </body>
12
+ </html>
requirements-benchmark.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ zeus-ml
2
+ fschat==0.2.14
3
+ rwkv==0.7.5
4
+ einops
5
+ tyro
requirements.txt CHANGED
@@ -1,5 +1,2 @@
1
- zeus-ml
2
- fschat==0.2.14
3
- rwkv==0.7.5
4
- einops
5
- tyro
 
1
+ plotly==5.15.0
2
+ gradio==3.35.2