Corey Morris commited on
Commit
12a9766
·
1 Parent(s): 18ec1ba

Moved radar plots to higher in the page

Browse files
Files changed (1) hide show
  1. app.py +30 -26
app.py CHANGED
@@ -276,32 +276,6 @@ else:
276
 
277
 
278
  # end of custom scatter plots
279
- st.markdown("## Notable findings and plots")
280
-
281
- st.markdown('### Abstract Algebra Performance')
282
- st.write("Small models showed surprisingly strong performance on the abstract algebra task. A 6 Billion parameter model is tied for the best performance on this task and there are a number of other small models in the top 10.")
283
- plot_top_n(filtered_data, 'MMLU_abstract_algebra', 10)
284
-
285
- fig = create_plot(filtered_data, 'Parameters', 'MMLU_abstract_algebra')
286
- st.plotly_chart(fig)
287
-
288
- # Moral scenarios plots
289
- st.markdown("### Moral Scenarios Performance")
290
- st.write("""
291
- While smaller models can perform well at many tasks, the model size threshold for decent performance on moral scenarios is much higher.
292
- There are no models with less than 13 billion parameters with performance much better than random chance. Further investigation into other capabilities that emerge at 13 billion parameters could help
293
- identify capabilities that are important for moral reasoning.
294
- """)
295
-
296
- fig = create_plot(filtered_data, 'Parameters', 'MMLU_moral_scenarios', title="Impact of Parameter Count on Accuracy for Moral Scenarios")
297
- st.plotly_chart(fig)
298
- st.write()
299
-
300
-
301
-
302
- fig = create_plot(filtered_data, 'MMLU_average', 'MMLU_moral_scenarios')
303
- st.plotly_chart(fig)
304
-
305
 
306
  # Section to select a model and display radar and line charts
307
  st.header("Compare a Selected Model to the 5 Models Closest in MMLU Average Performance")
@@ -338,6 +312,36 @@ fig_radar_top_differences = create_radar_chart_unfilled(filtered_data, closest_m
338
  # Display the radar chart
339
  st.plotly_chart(fig_radar_top_differences)
340
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
341
  st.markdown("***Thank you to hugging face for running the evaluations and supplying the data as well as the original authors of the evaluations.***")
342
 
343
  st.markdown("""
 
276
 
277
 
278
  # end of custom scatter plots
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
279
 
280
  # Section to select a model and display radar and line charts
281
  st.header("Compare a Selected Model to the 5 Models Closest in MMLU Average Performance")
 
312
  # Display the radar chart
313
  st.plotly_chart(fig_radar_top_differences)
314
 
315
+
316
+ st.markdown("## Notable findings and plots")
317
+
318
+ st.markdown('### Abstract Algebra Performance')
319
+ st.write("Small models showed surprisingly strong performance on the abstract algebra task. A 6 Billion parameter model is tied for the best performance on this task and there are a number of other small models in the top 10.")
320
+ plot_top_n(filtered_data, 'MMLU_abstract_algebra', 10)
321
+
322
+ fig = create_plot(filtered_data, 'Parameters', 'MMLU_abstract_algebra')
323
+ st.plotly_chart(fig)
324
+
325
+ # Moral scenarios plots
326
+ st.markdown("### Moral Scenarios Performance")
327
+ st.write("""
328
+ While smaller models can perform well at many tasks, the model size threshold for decent performance on moral scenarios is much higher.
329
+ There are no models with less than 13 billion parameters with performance much better than random chance. Further investigation into other capabilities that emerge at 13 billion parameters could help
330
+ identify capabilities that are important for moral reasoning.
331
+ """)
332
+
333
+ fig = create_plot(filtered_data, 'Parameters', 'MMLU_moral_scenarios', title="Impact of Parameter Count on Accuracy for Moral Scenarios")
334
+ st.plotly_chart(fig)
335
+ st.write()
336
+
337
+
338
+
339
+ fig = create_plot(filtered_data, 'MMLU_average', 'MMLU_moral_scenarios')
340
+ st.plotly_chart(fig)
341
+
342
+
343
+
344
+
345
  st.markdown("***Thank you to hugging face for running the evaluations and supplying the data as well as the original authors of the evaluations.***")
346
 
347
  st.markdown("""