Josephgflowers
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,15 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
This is a converted tinyllama model using the following script:
|
5 |
|
@@ -62,4 +72,4 @@ When these components work together, they create a model that is both flexible a
|
|
62 |
|
63 |
Accuracy and Generalization: The adaptive and context-sensitive adjustments should help the model generalize better to unseen data, as it dynamically adapts to different contexts and feature relevances.
|
64 |
Interpretability: With differential attention and channel recalibration, the model’s outputs are more interpretable, as it effectively shows how it emphasizes certain features or attention patterns based on context.
|
65 |
-
Convergence and Training Stability: Adaptive RMSNorm and Token Mixing add stability and locality, reducing issues with gradient explosion or vanishing. The model may reach optimal performance faster and with fewer parameter updates, making training more efficient.
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
datasets:
|
4 |
+
- HuggingFaceH4/ultrachat_200k
|
5 |
+
- Open-Orca/OpenOrca
|
6 |
+
- Josephgflowers/OpenOrca-Step-by-step-reasoning
|
7 |
+
- nampdn-ai/tiny-orca-textbooks
|
8 |
+
- HuggingFaceTB/cosmopedia-100k
|
9 |
+
- garage-bAInd/Open-Platypus
|
10 |
+
- nampdn-ai/tiny-textbooks
|
11 |
+
- teknium/openhermes
|
12 |
+
- KaggleMasterX/NASA_Complete
|
13 |
---
|
14 |
This is a converted tinyllama model using the following script:
|
15 |
|
|
|
72 |
|
73 |
Accuracy and Generalization: The adaptive and context-sensitive adjustments should help the model generalize better to unseen data, as it dynamically adapts to different contexts and feature relevances.
|
74 |
Interpretability: With differential attention and channel recalibration, the model’s outputs are more interpretable, as it effectively shows how it emphasizes certain features or attention patterns based on context.
|
75 |
+
Convergence and Training Stability: Adaptive RMSNorm and Token Mixing add stability and locality, reducing issues with gradient explosion or vanishing. The model may reach optimal performance faster and with fewer parameter updates, making training more efficient.
|