yhavinga commited on
Commit
70b6b2b
·
verified ·
1 Parent(s): d8ccede

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -4
README.md CHANGED
@@ -6,9 +6,20 @@ tags:
6
  - merge
7
 
8
  ---
9
- # boreas-10_7b
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
 
13
  ## Merge Details
14
  ### Merge Method
@@ -18,10 +29,10 @@ This model was merged using the passthrough merge method.
18
  ### Models Merged
19
 
20
  The following models were included in the merge:
21
- * boreas-7b-0-16
22
  * boreas-7b-16-32
23
  * boreas-7b-0-8-16-24
24
- * boreas-7b-8-16-24-32
25
 
26
  ### Configuration
27
 
@@ -43,5 +54,58 @@ slices:
43
  layer_range: [0, 16]
44
  merge_method: passthrough
45
  dtype: bfloat16
 
 
 
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ```
 
 
 
6
  - merge
7
 
8
  ---
9
+ # boreas-10_7b-v0
10
+
11
+ # !NB: THIS MODEL NEEDS CONTINUED (PRE)TRAINING
12
+
13
+ This is the result of step 1 of the upscaling of [Boreas-7B](https://huggingface.co/yhavinga/Boreas-7B) with [mergekit](https://github.com/cg123/mergekit).
14
+ It is trying to reproduce the upscaling described in the [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166)
15
+ paper.
16
+ The original model upscaled is the finetuned
17
+ This model is the result after step 1 from the figure below:
18
+
19
+ ![SOLAR 10.7B Depth up scaling](img_2.png)
20
+
21
+ Step 2 continued training is being done - result will be another model.
22
 
 
23
 
24
  ## Merge Details
25
  ### Merge Method
 
29
  ### Models Merged
30
 
31
  The following models were included in the merge:
32
+ * boreas-7b-8-16-24-32
33
  * boreas-7b-16-32
34
  * boreas-7b-0-8-16-24
35
+ * boreas-7b-0-16
36
 
37
  ### Configuration
38
 
 
54
  layer_range: [0, 16]
55
  merge_method: passthrough
56
  dtype: bfloat16
57
+ ```
58
+
59
+ The four models were created with the following configurations:
60
 
61
+ ```yaml
62
+ slices:
63
+ - sources:
64
+ - model: yhavinga/Boreas-7B
65
+ layer_range: [0, 16]
66
+ merge_method: passthrough
67
+ dtype: bfloat16
68
+ ---
69
+ slices:
70
+ - sources:
71
+ - model: yhavinga/Boreas-7B
72
+ layer_range: [0, 8]
73
+ - model: yhavinga/Boreas-7B
74
+ layer_range: [16, 24]
75
+ merge_method: slerp
76
+ base_model: yhavinga/Boreas-7B
77
+ parameters:
78
+ t:
79
+ - filter: self_attn
80
+ value: [0, 0.5, 0.3, 0.7, 1]
81
+ - filter: mlp
82
+ value: [1, 0.5, 0.7, 0.3, 0]
83
+ - value: 0.5
84
+ dtype: bfloat16
85
+ ---
86
+ slices:
87
+ - sources:
88
+ - model: yhavinga/Boreas-7B
89
+ layer_range: [8, 16]
90
+ - model: yhavinga/Boreas-7B
91
+ layer_range: [24, 32]
92
+ merge_method: slerp
93
+ base_model: yhavinga/Boreas-7B
94
+ parameters:
95
+ t:
96
+ - filter: self_attn
97
+ value: [0, 0.5, 0.3, 0.7, 1]
98
+ - filter: mlp
99
+ value: [1, 0.5, 0.7, 0.3, 0]
100
+ - value: 0.5
101
+ dtype: bfloat16
102
+ ---
103
+ slices:
104
+ - sources:
105
+ - model: yhavinga/Boreas-7B
106
+ layer_range: [16, 32]
107
+ merge_method: passthrough
108
+ dtype: bfloat16
109
  ```
110
+
111
+