Ludwig Stumpp commited on
Commit
5323497
·
1 Parent(s): c0dd25e

Add column for commercial use + logic in streamlit app + disclaimer

Browse files
Files changed (2) hide show
  1. README.md +42 -38
  2. streamlit_app.py +20 -5
README.md CHANGED
@@ -20,43 +20,43 @@ We are always happy for contributions! You can contribute by the following:
20
 
21
  ## Leaderboard
22
 
23
- | Model Name | Chatbot Arena Elo | HumanEval-Python (pass@1) | LAMBADA (zero-shot) | MMLU (zero-shot) | TriviaQA (zero-shot) |
24
- | -------------------------------------------------------------------------------------- | ------------------------------------------------ | ------------------------------------------------------------------------------ | --------------------------------------------- | ---------------- | --------------------------------------------- |
25
- | [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
26
- | [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | 0.259 | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
27
- | [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | 0.258 | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
28
- | [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
29
- | [chinchilla-70b](https://arxiv.org/abs/2203.15556v1) | | | [0.774](https://arxiv.org/abs/2203.15556v1) | | |
30
- | [code-cushman-001](https://arxiv.org/abs/2107.03374) | | [33.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
31
- | [code-davinci-002](https://arxiv.org/abs/2207.10397v2) | | [65.8](https://arxiv.org/abs/2207.10397v2) | | | |
32
- | [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) | | [29.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
33
- | [codegen-16B-multi](https://huggingface.co/Salesforce/codegen-16B-multi) | | [18.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
34
- | [codegx-13b](http://keg.cs.tsinghua.edu.cn/codegeex/) | | [22.9](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
35
- | [codex-12b](https://arxiv.org/abs/2107.03374v2) | | [28.81](https://arxiv.org/abs/2107.03374v2) | | | |
36
- | [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
37
- | [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | 0.265 | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
38
- | [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | 0.253 | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
39
- | [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
40
- | [gpt-3.5-175b](https://arxiv.org/abs/2303.08774v3) | | [48.1](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | |
41
- | [gpt-4](https://arxiv.org/abs/2303.08774v3) | | [67.0](https://arxiv.org/abs/2303.08774v3) | | | |
42
- | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | 0.269 | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
43
- | [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | 0.261 | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
44
- | [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
45
- | [llama-7b](https://arxiv.org/abs/2302.13971) | | [10.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.738](https://www.mosaicml.com/blog/mpt-7b) | 0.302 | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
46
- | [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/) | [15.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
47
- | [llama-33b](https://arxiv.org/abs/2302.13971) | | [21.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
48
- | [llama-65b](https://arxiv.org/abs/2302.13971) | | [23.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
49
- | [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | 0.296 | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
50
- | [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
51
- | [opt-7b](https://huggingface.co/facebook/opt-6.7b) | | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | 0.251 | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
52
- | [opt-13b](https://huggingface.co/facebook/opt-13b) | | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | 0.257 | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
53
- | [palm-540b](https://arxiv.org/abs/2204.02311v5) | | [26.2](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.779](https://arxiv.org/abs/2204.02311v5) | | |
54
- | [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | 0.251 | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
55
- | [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
56
- | [starcoder-base-16B](https://huggingface.co/bigcode/starcoderbase) | | [30.4](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
57
- | [starcoder-16B](https://huggingface.co/bigcode/starcoder) | | [33.6](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
58
- | [starcoder-16B (prompted)](https://huggingface.co/bigcode/starcoder) | | [40.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
59
- | [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
60
 
61
  ## Benchmarks
62
 
@@ -70,4 +70,8 @@ We are always happy for contributions! You can contribute by the following:
70
 
71
  ## Sources
72
 
73
- The results of this leaderboard are collected from the individual papers and published results of the model authors. For each reported value, the source is added as a link.
 
 
 
 
 
20
 
21
  ## Leaderboard
22
 
23
+ | Model Name | Commercial Use? | Chatbot Arena Elo | HumanEval-Python (pass@1) | LAMBADA (zero-shot) | MMLU (zero-shot) | TriviaQA (zero-shot) |
24
+ | -------------------------------------------------------------------------------------- | --------------- | ------------------------------------------------ | ------------------------------------------------------------------------------ | --------------------------------------------- | ---------------- | --------------------------------------------- |
25
+ | [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | no | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
26
+ | [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | yes | | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | 0.259 | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
27
+ | [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | yes | | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | 0.258 | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
28
+ | [chatglm-6b](https://chatglm.cn/blog) | yes | [985](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
29
+ | [chinchilla-70b](https://arxiv.org/abs/2203.15556v1) | no | | | [0.774](https://arxiv.org/abs/2203.15556v1) | | |
30
+ | [code-cushman-001](https://arxiv.org/abs/2107.03374) | no | | [33.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
31
+ | [code-davinci-002](https://arxiv.org/abs/2207.10397v2) | yes | | [65.8](https://arxiv.org/abs/2207.10397v2) | | | |
32
+ | [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) | yes | | [29.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
33
+ | [codegen-16B-multi](https://huggingface.co/Salesforce/codegen-16B-multi) | yes | | [18.3](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
34
+ | [codegx-13b](http://keg.cs.tsinghua.edu.cn/codegeex/) | no | | [22.9](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
35
+ | [codex-12b](https://arxiv.org/abs/2107.03374v2) | no | | [28.81](https://arxiv.org/abs/2107.03374v2) | | | |
36
+ | [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | yes | [944](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
37
+ | [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | yes | | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | 0.265 | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
38
+ | [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | yes | | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | 0.253 | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
39
+ | [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | yes | [951](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
40
+ | [gpt-3.5-175b](https://arxiv.org/abs/2303.08774v3) | yes | | [48.1](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | |
41
+ | [gpt-4](https://arxiv.org/abs/2303.08774v3) | yes | | [67.0](https://arxiv.org/abs/2303.08774v3) | | | |
42
+ | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | yes | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | 0.269 | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
43
+ | [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | yes | | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | 0.261 | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
44
+ | [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | no | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
45
+ | [llama-7b](https://arxiv.org/abs/2302.13971) | no | | [10.5](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.738](https://www.mosaicml.com/blog/mpt-7b) | 0.302 | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
46
+ | [llama-13b](https://arxiv.org/abs/2302.13971) | no | [932](https://lmsys.org/blog/2023-05-03-arena/) | [15.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
47
+ | [llama-33b](https://arxiv.org/abs/2302.13971) | no | | [21.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
48
+ | [llama-65b](https://arxiv.org/abs/2302.13971) | no | | [23.7](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
49
+ | [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | yes | | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | 0.296 | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
50
+ | [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | yes | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
51
+ | [opt-7b](https://huggingface.co/facebook/opt-6.7b) | no | | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | 0.251 | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
52
+ | [opt-13b](https://huggingface.co/facebook/opt-13b) | no | | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | 0.257 | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
53
+ | [palm-540b](https://arxiv.org/abs/2204.02311v5) | no | | [26.2](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.779](https://arxiv.org/abs/2204.02311v5) | | |
54
+ | [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | yes | | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | 0.251 | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
55
+ | [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | no | [858](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
56
+ | [starcoder-base-16B](https://huggingface.co/bigcode/starcoderbase) | yes | | [30.4](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
57
+ | [starcoder-16B](https://huggingface.co/bigcode/starcoder) | yes | | [33.6](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
58
+ | [starcoder-16B (prompted)](https://huggingface.co/bigcode/starcoder) | yes | | [40.8](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | |
59
+ | [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | no | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | | | |
60
 
61
  ## Benchmarks
62
 
 
70
 
71
  ## Sources
72
 
73
+ The results of this leaderboard are collected from the individual papers and published results of the model authors. For each reported value, the source is added as a link.
74
+
75
+ ## Disclaimer
76
+
77
+ Above information may be wrong. If you want to use a published model for commercial use, please contact a lawyer.
streamlit_app.py CHANGED
@@ -21,7 +21,8 @@ def extract_table_and_format_from_markdown_text(markdown_table: str) -> pd.DataF
21
  .dropna(axis=1, how="all") # drop empty columns
22
  .iloc[1:] # drop first row which is the "----" separator of the original markdown table
23
  .sort_index(ascending=True)
24
- .replace(r"^\s*$", float("nan"), regex=True)
 
25
  .astype(float, errors="ignore")
26
  )
27
 
@@ -85,7 +86,7 @@ def remove_markdown_links(text: str) -> str:
85
  return text
86
 
87
 
88
- def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
89
  """
90
  Adds a UI on top of a dataframe to let viewers filter columns
91
 
@@ -93,6 +94,7 @@ def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
93
 
94
  Args:
95
  df (pd.DataFrame): Original dataframe
 
96
 
97
  Returns:
98
  pd.DataFrame: Filtered dataframe
@@ -104,6 +106,9 @@ def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
104
 
105
  df = df.copy()
106
 
 
 
 
107
  modification_container = st.container()
108
 
109
  with modification_container:
@@ -111,9 +116,9 @@ def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
111
  if to_filter_index:
112
  df = pd.DataFrame(df.loc[to_filter_index])
113
 
114
- to_filter_columns = st.multiselect("Filter by benchmark:", df.columns)
115
  if to_filter_columns:
116
- df = pd.DataFrame(df[to_filter_columns])
117
 
118
  return df
119
 
@@ -138,9 +143,10 @@ def setup_leaderboard(readme: str):
138
  leaderboard_table = extract_markdown_table_from_multiline(readme, table_headline="## Leaderboard")
139
  leaderboard_table = remove_markdown_links(leaderboard_table)
140
  df_leaderboard = extract_table_and_format_from_markdown_text(leaderboard_table)
 
141
 
142
  st.markdown("## Leaderboard")
143
- st.dataframe(filter_dataframe(df_leaderboard))
144
 
145
 
146
  def setup_benchmarks(readme: str):
@@ -168,6 +174,14 @@ def setup_sources():
168
  )
169
 
170
 
 
 
 
 
 
 
 
 
171
  def setup_footer():
172
  st.markdown(
173
  """
@@ -186,6 +200,7 @@ def main():
186
  setup_leaderboard(readme)
187
  setup_benchmarks(readme)
188
  setup_sources()
 
189
  setup_footer()
190
 
191
 
 
21
  .dropna(axis=1, how="all") # drop empty columns
22
  .iloc[1:] # drop first row which is the "----" separator of the original markdown table
23
  .sort_index(ascending=True)
24
+ .apply(lambda x: x.str.strip() if x.dtype == "object" else x)
25
+ .replace("", float("NaN"))
26
  .astype(float, errors="ignore")
27
  )
28
 
 
86
  return text
87
 
88
 
89
+ def filter_dataframe(df: pd.DataFrame, ignore_columns: list[str] | None = None) -> pd.DataFrame:
90
  """
91
  Adds a UI on top of a dataframe to let viewers filter columns
92
 
 
94
 
95
  Args:
96
  df (pd.DataFrame): Original dataframe
97
+ ignore_columns (list[str], optional): Columns to ignore. Defaults to None.
98
 
99
  Returns:
100
  pd.DataFrame: Filtered dataframe
 
106
 
107
  df = df.copy()
108
 
109
+ if ignore_columns is None:
110
+ ignore_columns = []
111
+
112
  modification_container = st.container()
113
 
114
  with modification_container:
 
116
  if to_filter_index:
117
  df = pd.DataFrame(df.loc[to_filter_index])
118
 
119
+ to_filter_columns = st.multiselect("Filter by benchmark:", [c for c in df.columns if c not in ignore_columns])
120
  if to_filter_columns:
121
+ df = pd.DataFrame(df[ignore_columns + to_filter_columns])
122
 
123
  return df
124
 
 
143
  leaderboard_table = extract_markdown_table_from_multiline(readme, table_headline="## Leaderboard")
144
  leaderboard_table = remove_markdown_links(leaderboard_table)
145
  df_leaderboard = extract_table_and_format_from_markdown_text(leaderboard_table)
146
+ df_leaderboard["Commercial Use?"] = df_leaderboard["Commercial Use?"].map({"yes": 1, "no": 0}).astype(bool)
147
 
148
  st.markdown("## Leaderboard")
149
+ st.dataframe(filter_dataframe(df_leaderboard, ignore_columns=["Commercial Use?"]))
150
 
151
 
152
  def setup_benchmarks(readme: str):
 
174
  )
175
 
176
 
177
+ def setup_disclaimer():
178
+ st.markdown("## Disclaimer")
179
+ st.markdown(
180
+ "Above information may be wrong. If you want to use a published model for commercial use, please contact a "
181
+ "lawyer."
182
+ )
183
+
184
+
185
  def setup_footer():
186
  st.markdown(
187
  """
 
200
  setup_leaderboard(readme)
201
  setup_benchmarks(readme)
202
  setup_sources()
203
+ setup_disclaimer()
204
  setup_footer()
205
 
206