Lyte
/

Llama-3.2-3B-Overthinker

@@ -16,17 +16,21 @@ pipeline_tag: text-generation
 ---
-# Model Information:
-- This model was trained with a dataset that has the following columns: initial reasoning/assessment, step by step, verifications that come after each step, and final answer presentation based on full context, is it better than the original base model, i don't know, i am not sure i can run evals on it and i can't afford to run them manually.
-- The model will basically (over)think for longer before answering you, it's best to use minimum 4k or up to 16k context to allow it to (over)think, it was trained with 32k context.
-- Model's performance from manual testing seems to show the model does better at chatting(mental health, safety, creativity, etc...) from my personal tests so far, and honestly best i can tell you is test it yourself using this [Colab Notebook](https://colab.research.google.com/drive/1dcBbHAwYJuQJKqdPU570Hddv_F9wzjPO?usp=sharing)
-- The dataset i have public is not the full dataset used, and the dataset originally was meant for something entirely different using a custom MoE architecture unfortunately i cannot afford to run the experiment.
-- KingNish re-ignited the passion for me to re-pick up this because i had just given up on it after the first attempt a month or so ago that i shared, so cheers and enjoy the toy.
 # Inference Code:
-- Feel free to make the steps and verifications hidden and the initial reasoning and show only the final answer to get an o1 feel(i don't know)
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer

 ---
+# Model Overview:
+- **Training Data**: This model was trained on a dataset with columns for initial reasoning, step-by-step thinking, verifications after each step, and final answers based on full context. Is it better than the original base model? Hard to say without proper evaluations, and I don’t have the resources to run them manually.
+- **Context Handling**: The model benefits from larger contexts (minimum 4k up to 16k tokens, though it was trained on 32k tokens). It tends to "overthink," so providing a longer context helps it perform better.
+- **Performance**: Based on my very few manual tests, the model seems to excel in conversational settings—especially for mental health, creative tasks and explaining stuff. However, I encourage you to try it out yourself using this [Colab Notebook](https://colab.research.google.com/drive/1dcBbHAwYJuQJKqdPU570Hddv_F9wzjPO?usp=sharing).
+- **Dataset Note**: The publicly available dataset is only a partial version. The full dataset was originally designed for a custom Mixture of Experts (MoE) architecture, but I couldn't afford to run the full experiment.
+- **Acknowledgment**: Special thanks to KingNish for reigniting my passion to revisit this project. I almost abandoned it after my first attempt a month ago. Enjoy this experimental model!
 # Inference Code:
+- Feel free to make the steps and verifications collapsable and the initial reasoning too, you can show only the final answer to get an o1 feel(i don't know)
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer