Fischerboot commited on
Commit
9707629
·
verified ·
1 Parent(s): c74df71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -31
README.md CHANGED
@@ -1,31 +1,132 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
-
8
- ---
9
- # output-model-directory
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the passthrough merge method.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * ./3b + ./thinking-3b
22
-
23
- ### Configuration
24
-
25
- The following YAML configuration was used to produce this model:
26
-
27
- ```yaml
28
- models:
29
- - model: ./3b+./thinking-3b
30
- merge_method: passthrough
31
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - meta-llama/Llama-3.2-3B-Instruct
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ license: llama3.2
9
+ datasets:
10
+ - Fischerboot/small-boi-thinkin
11
+ language:
12
+ - en
13
+ ---
14
+ # Llama-3.2-3B-SmartBoi
15
+
16
+ This is a finetune to include <thinking> tags (plus others).
17
+
18
+ It makes this model a lot smarter, alltho the tags are only used in english.
19
+
20
+ This model has not been made uncensored.
21
+
22
+ ## Prompt Template
23
+
24
+ This Model uses Llama-3 Chat:
25
+
26
+ ```
27
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
28
+
29
+ This is text
30
+
31
+ {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
32
+
33
+ {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
34
+ ```
35
+
36
+ ## Finetune Info:
37
+
38
+ The following YAML configuration was used to finetune this model:
39
+
40
+ ```yaml
41
+ base_model: alpindale/Llama-3.2-3B-Instruct
42
+ model_type: LlamaForCausalLM
43
+ tokenizer_type: AutoTokenizer
44
+
45
+ load_in_8bit: false
46
+ load_in_4bit: true
47
+ strict: false
48
+
49
+ chat_template: llama3
50
+ datasets:
51
+ - path: Fischerboot/small-boi-thinkin
52
+ type: sharegpt
53
+ conversation: llama3
54
+ dataset_prepared_path: last_run_prepared
55
+ val_set_size: 0.1
56
+ output_dir: ./outputs/yuh
57
+
58
+ adapter: qlora
59
+ lora_model_dir:
60
+
61
+ sequence_len: 4096
62
+ sample_packing: false
63
+ pad_to_sequence_len: true
64
+
65
+ lora_r: 32
66
+ lora_alpha: 16
67
+ lora_dropout: 0.05
68
+ lora_target_linear: true
69
+ lora_fan_in_fan_out:
70
+ lora_target_modules:
71
+ - gate_proj
72
+ - down_proj
73
+ - up_proj
74
+ - q_proj
75
+ - v_proj
76
+ - k_proj
77
+ - o_proj
78
+
79
+ wandb_project:
80
+ wandb_entity:
81
+ wandb_watch:
82
+ wandb_name:
83
+ wandb_log_model:
84
+
85
+ gradient_accumulation_steps: 1
86
+ micro_batch_size: 1
87
+ num_epochs: 1
88
+ optimizer: adamw_bnb_8bit
89
+ lr_scheduler: cosine
90
+ learning_rate: 0.0002
91
+
92
+ train_on_inputs: false
93
+ group_by_length: false
94
+ bf16: auto
95
+ fp16:
96
+ tf32: false
97
+
98
+ gradient_checkpointing: true
99
+ early_stopping_patience:
100
+ resume_from_checkpoint:
101
+ local_rank:
102
+ logging_steps: 1
103
+ xformers_attention:
104
+ flash_attention: true
105
+
106
+ loss_watchdog_threshold: 8.0
107
+ loss_watchdog_patience: 3
108
+
109
+ eval_sample_packing: false
110
+ warmup_steps: 10
111
+ evals_per_epoch: 2
112
+ eval_table_size:
113
+ eval_max_new_tokens: 128
114
+ saves_per_epoch: 2
115
+ debug:
116
+ deepspeed:
117
+ weight_decay: 0.0
118
+ fsdp:
119
+ fsdp_config:
120
+ special_tokens:
121
+ bos_token: "<|begin_of_text|>"
122
+ eos_token: "<|end_of_text|>"
123
+ pad_token: "<|end_of_text|>"
124
+ ```
125
+
126
+ ### Training results:
127
+
128
+
129
+ | Training Loss | Epoch | Step | Validation Loss |
130
+ |:-------------:|:------:|:-----:|:---------------:|
131
+ | 1.5032 | 0.0000 | 1 | 1.6556 |
132
+ | 1.2011 | 0.5000 | 10553 | 0.6682 |