giraffe176
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -145,7 +145,9 @@ As part of this process, I tried to figure out if there was a way to determine a
|
|
145 |
Too late in the process, I learned that [dare_ties](https://arxiv.org/abs/2311.03099) has a random element to it. Valuable information for next time, I guess. After concluding that project, I began collecting more data, this time setting a specified seed in mergekit for reproducibility. As I was collecting data, I hit the goal I had set for myself.
|
146 |
This model is *not* a result of the above work but is the genesis of how this model came to be.
|
147 |
|
148 |
-
I present, **Starling_Monarch_Westlake_Garten-7B-v0.1**, the only 7B model to score > 80 on the EQ-Bench v2.1 benchmark found [here](https://github.com/EQ-bench/EQ-Bench), outscoring larger models like [abacusai/Smaug-72B-v0.1](https://huggingface.co/abacusai/Smaug-72B-v0.1) and [cognitivecomputations/dolphin-2.2-70b](https://huggingface.co/cognitivecomputations/dolphin-2.2-70b)
|
|
|
|
|
149 |
|
150 |
It also earned 8.109 on MT-Bench[(paper)](https://arxiv.org/abs/2306.05685), outscoring Chat-GPT 3.5 and Claude v1.
|
151 |
|
|
|
145 |
Too late in the process, I learned that [dare_ties](https://arxiv.org/abs/2311.03099) has a random element to it. Valuable information for next time, I guess. After concluding that project, I began collecting more data, this time setting a specified seed in mergekit for reproducibility. As I was collecting data, I hit the goal I had set for myself.
|
146 |
This model is *not* a result of the above work but is the genesis of how this model came to be.
|
147 |
|
148 |
+
I present, **Starling_Monarch_Westlake_Garten-7B-v0.1**, the **only 7B model to score > 80** on the EQ-Bench v2.1 benchmark found [here](https://github.com/EQ-bench/EQ-Bench), outscoring larger models like [abacusai/Smaug-72B-v0.1](https://huggingface.co/abacusai/Smaug-72B-v0.1) and [cognitivecomputations/dolphin-2.2-70b](https://huggingface.co/cognitivecomputations/dolphin-2.2-70b)
|
149 |
+
|
150 |
+
It also surpasses its components in the GSM8K benchmark, with a score of 71.95. I'll be looking to bring out more logic and emotion in the next evolution of this model.
|
151 |
|
152 |
It also earned 8.109 on MT-Bench[(paper)](https://arxiv.org/abs/2306.05685), outscoring Chat-GPT 3.5 and Claude v1.
|
153 |
|