Update README.md
Browse files
README.md
CHANGED
@@ -16,17 +16,21 @@ pipeline_tag: text-generation
|
|
16 |
---
|
17 |
|
18 |
|
19 |
-
# Model
|
20 |
|
21 |
-
- This model was trained
|
22 |
-
|
23 |
-
-
|
24 |
-
|
25 |
-
-
|
|
|
|
|
|
|
|
|
26 |
|
27 |
# Inference Code:
|
28 |
|
29 |
-
- Feel free to make the steps and verifications
|
30 |
|
31 |
```python
|
32 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
16 |
---
|
17 |
|
18 |
|
19 |
+
# Model Overview:
|
20 |
|
21 |
+
- **Training Data**: This model was trained on a dataset with columns for initial reasoning, step-by-step thinking, verifications after each step, and final answers based on full context. Is it better than the original base model? Hard to say without proper evaluations, and I don’t have the resources to run them manually.
|
22 |
+
|
23 |
+
- **Context Handling**: The model benefits from larger contexts (minimum 4k up to 16k tokens, though it was trained on 32k tokens). It tends to "overthink," so providing a longer context helps it perform better.
|
24 |
+
|
25 |
+
- **Performance**: Based on my very few manual tests, the model seems to excel in conversational settings—especially for mental health, creative tasks and explaining stuff. However, I encourage you to try it out yourself using this [Colab Notebook](https://colab.research.google.com/drive/1dcBbHAwYJuQJKqdPU570Hddv_F9wzjPO?usp=sharing).
|
26 |
+
|
27 |
+
- **Dataset Note**: The publicly available dataset is only a partial version. The full dataset was originally designed for a custom Mixture of Experts (MoE) architecture, but I couldn't afford to run the full experiment.
|
28 |
+
|
29 |
+
- **Acknowledgment**: Special thanks to KingNish for reigniting my passion to revisit this project. I almost abandoned it after my first attempt a month ago. Enjoy this experimental model!
|
30 |
|
31 |
# Inference Code:
|
32 |
|
33 |
+
- Feel free to make the steps and verifications collapsable and the initial reasoning too, you can show only the final answer to get an o1 feel(i don't know)
|
34 |
|
35 |
```python
|
36 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|