natolambert
commited on
Update src/md.py
Browse files
src/md.py
CHANGED
@@ -1,3 +1,5 @@
|
|
|
|
|
|
1 |
ABOUT_TEXT = """
|
2 |
We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
|
3 |
A win is when the score for the chosen response is higher than the score for the rejected response.
|
@@ -92,10 +94,13 @@ Lengths (mean, std. dev.) include the prompt
|
|
92 |
For more details, see the [dataset](https://huggingface.co/datasets/allenai/reward-bench).
|
93 |
"""
|
94 |
|
95 |
-
|
96 |
-
|
|
|
|
|
|
|
|
|
97 |
### Evaluating the capabilities, safety, and pitfalls of reward models
|
98 |
-
[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787) | Total models: {} | * Unverified models | ⚠️ Dataset Contamination
|
99 |
|
100 |
-
⚠️ Many of the top models were trained on unintentionally contaminated, AI-generated data, for more information, see this [gist](https://gist.github.com/natolambert/1aed306000c13e0e8c5bc17c1a5dd300).
|
101 |
-
"""
|
|
|
1 |
+
from datetime import datetime
|
2 |
+
|
3 |
ABOUT_TEXT = """
|
4 |
We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
|
5 |
A win is when the score for the chosen response is higher than the score for the rejected response.
|
|
|
94 |
For more details, see the [dataset](https://huggingface.co/datasets/allenai/reward-bench).
|
95 |
"""
|
96 |
|
97 |
+
# Get current time formatted nicely
|
98 |
+
current_time = datetime.now().strftime("%H:%M, %d %b %Y")
|
99 |
+
|
100 |
+
TOP_TEXT = f"""# RewardBench: Evaluating Reward Models
|
101 |
+
Last restart: {current_time}
|
102 |
+
|
103 |
### Evaluating the capabilities, safety, and pitfalls of reward models
|
104 |
+
[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787) | Total models: {{}} | * Unverified models | ⚠️ Dataset Contamination
|
105 |
|
106 |
+
⚠️ Many of the top models were trained on unintentionally contaminated, AI-generated data, for more information, see this [gist](https://gist.github.com/natolambert/1aed306000c13e0e8c5bc17c1a5dd300)."""
|
|