pansophic commited on
Commit
cf41972
·
1 Parent(s): 951ff2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -67,14 +67,13 @@ In AlpacaEval, Rocket 🦝 achieves a near 80% win rate, coupled with an average
67
 
68
  | Metric | Value |
69
  |-----------------------|---------------------------|
70
- | Avg. | 52.15 |
71
  | ARC (25-shot) | 50.51 |
72
- | HellaSwag (10-shot) | 73.91 |
73
- | MMLU (5-shot) | 61.07 |
74
  | TruthfulQA (mc2) (0-shot) | 54.38 |
75
- | Winogrande (5-shot) | 63.22 |
 
76
  | GSM8K (5-shot) | 37.91 |
77
- | DROP (3-shot) | 9.66 |
78
 
79
 
80
  ## Intended uses & limitations
@@ -132,10 +131,13 @@ generated_text = model.generate(**inputs, max_length=3084, top_p=0.95, do_sample
132
  ```
133
 
134
  ## Bias, Risks, and Limitations
135
- Unlike ChatGPT, which incorporates in-the-loop filtering of responses and is aligned during the RLHF phase for safe completions, our model lacks these features. Consequently, it may generate problematic outputs, particularly when prompted in certain ways.
136
 
137
  The pretraining dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
138
 
 
 
 
139
 
140
  **The model name is inspired by the small but formidable character from 'Guardians of the Galaxy'. Similar to its namesake, this model, with its 3 billion parameters, showcases remarkable efficiency and effectiveness, challenging larger models despite its smaller size."*
141
 
 
67
 
68
  | Metric | Value |
69
  |-----------------------|---------------------------|
 
70
  | ARC (25-shot) | 50.51 |
71
+ | HellaSwag (0-shot) | 73.91 |
 
72
  | TruthfulQA (mc2) (0-shot) | 54.38 |
73
+ | BoolQ (0-shot) | 81.71 |
74
+ | Winogrande (5-shot) | 67.8 |
75
  | GSM8K (5-shot) | 37.91 |
76
+ | MathQA (5-shot) | 31.26 |
77
 
78
 
79
  ## Intended uses & limitations
 
131
  ```
132
 
133
  ## Bias, Risks, and Limitations
134
+ Unlike ChatGPT, which incorporates in-the-loop filtering of responses and is aligned during the RLHF phase for safe completions, our model lacks these features. Consequently, it may generate problematic outputs, particularly when prompted in certain ways. Below is the score of the model on Toxigen benchmark.
135
 
136
  The pretraining dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
137
 
138
+ | Metric | Value |
139
+ |-----------------------|---------------------------|
140
+ | Toxigen (0-shot) | 43.40 |
141
 
142
  **The model name is inspired by the small but formidable character from 'Guardians of the Galaxy'. Similar to its namesake, this model, with its 3 billion parameters, showcases remarkable efficiency and effectiveness, challenging larger models despite its smaller size."*
143