natolambert commited on
Commit
da8be66
·
verified ·
1 Parent(s): 3a66f0f

Update src/md.py

Browse files
Files changed (1) hide show
  1. src/md.py +10 -5
src/md.py CHANGED
@@ -1,3 +1,5 @@
 
 
1
  ABOUT_TEXT = """
2
  We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
3
  A win is when the score for the chosen response is higher than the score for the rejected response.
@@ -92,10 +94,13 @@ Lengths (mean, std. dev.) include the prompt
92
  For more details, see the [dataset](https://huggingface.co/datasets/allenai/reward-bench).
93
  """
94
 
95
- TOP_TEXT = """
96
- # RewardBench: Evaluating Reward Models
 
 
 
 
97
  ### Evaluating the capabilities, safety, and pitfalls of reward models
98
- [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787) | Total models: {} | * Unverified models | ⚠️ Dataset Contamination
99
 
100
- ⚠️ Many of the top models were trained on unintentionally contaminated, AI-generated data, for more information, see this [gist](https://gist.github.com/natolambert/1aed306000c13e0e8c5bc17c1a5dd300).
101
- """
 
1
+ from datetime import datetime
2
+
3
  ABOUT_TEXT = """
4
  We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
5
  A win is when the score for the chosen response is higher than the score for the rejected response.
 
94
  For more details, see the [dataset](https://huggingface.co/datasets/allenai/reward-bench).
95
  """
96
 
97
+ # Get current time formatted nicely
98
+ current_time = datetime.now().strftime("%H:%M, %d %b %Y")
99
+
100
+ TOP_TEXT = f"""# RewardBench: Evaluating Reward Models
101
+ Last restart: {current_time}
102
+
103
  ### Evaluating the capabilities, safety, and pitfalls of reward models
104
+ [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787) | Total models: {{}} | * Unverified models | ⚠️ Dataset Contamination
105
 
106
+ ⚠️ Many of the top models were trained on unintentionally contaminated, AI-generated data, for more information, see this [gist](https://gist.github.com/natolambert/1aed306000c13e0e8c5bc17c1a5dd300)."""