Shane commited on
Commit
c96dbc6
1 Parent(s): 557b080

updated citations

Browse files
Files changed (2) hide show
  1. app.py +5 -5
  2. src/md.py +0 -2
app.py CHANGED
@@ -167,11 +167,11 @@ with gr.Blocks(css=custom_css) as app:
167
  with gr.Row():
168
  with gr.Accordion("📚 Citation", open=False):
169
  citation_button = gr.Textbox(
170
- value=r"""@misc{RewardBench,
171
- title={RewardBench: Evaluating Reward Models for Language Modeling},
172
- author={Lambert, Nathan and Pyatkin, Valentina and Morrison, Jacob and Miranda, LJ and Lin, Bill Yuchen and Chandu, Khyathi and Dziri, Nouha and Kumar, Sachin and Zick, Tom and Choi, Yejin and Smith, Noah A. and Hajishirzi, Hannaneh},
173
- year={2024},
174
- howpublished={\url{https://huggingface.co/spaces/allenai/reward-bench}
175
  }""",
176
  lines=7,
177
  label="Copy the following to cite these results.",
 
167
  with gr.Row():
168
  with gr.Accordion("📚 Citation", open=False):
169
  citation_button = gr.Textbox(
170
+ value=r"""@article{lyu2024href,
171
+ title={HREF: Human Response-Guided Evaluation of Instruction Following in Language Models},
172
+ author={Xinxi Lyu and Yizhong Wang and Hannaneh Hajishirzi and Pradeep Dasigi},
173
+ journal={arXiv preprint arXiv:2412.15524},
174
+ year={2024}
175
  }""",
176
  lines=7,
177
  label="Copy the following to cite these results.",
src/md.py CHANGED
@@ -23,8 +23,6 @@ For reproductability, we use greedy decoding for all model generation as default
23
  - **Large**: HREF has the largest evaluation size among similar benchmarks, making its evaluation more reliable.
24
  - **Contamination-resistant**: HREF's evaluation set is hidden and uses public models for both the baseline model and judge model, which makes it completely free of contamination.
25
  - **Task Oriented**: Instead of naturally collected instructions from the user, HREF contains instructions that are written specifically targetting 8 distinct categories that are used in instruction tuning, which allows it to provide more insights about how to improve language models.
26
- ## Contact Us
27
- TODO
28
  """
29
 
30
  # Get Pacific time zone (handles PST/PDT automatically)
 
23
  - **Large**: HREF has the largest evaluation size among similar benchmarks, making its evaluation more reliable.
24
  - **Contamination-resistant**: HREF's evaluation set is hidden and uses public models for both the baseline model and judge model, which makes it completely free of contamination.
25
  - **Task Oriented**: Instead of naturally collected instructions from the user, HREF contains instructions that are written specifically targetting 8 distinct categories that are used in instruction tuning, which allows it to provide more insights about how to improve language models.
 
 
26
  """
27
 
28
  # Get Pacific time zone (handles PST/PDT automatically)