T145 commited on
Commit
7f6855b
·
verified ·
1 Parent(s): f9faad0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -52
README.md CHANGED
@@ -1,52 +1,52 @@
1
- ---
2
- base_model:
3
- - mukaj/Llama-3.1-Hawkish-8B
4
- - T145/KRONOS-8B-V1-P3
5
- - T145/KRONOS-8B-V1-P1
6
- - unsloth/Meta-Llama-3.1-8B-Instruct
7
- - T145/KRONOS-8B-V1-P2
8
- library_name: transformers
9
- tags:
10
- - mergekit
11
- - merge
12
-
13
- ---
14
- # Untitled Model (1)
15
-
16
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
17
-
18
- ## Merge Details
19
- ### Merge Method
20
-
21
- This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) as a base.
22
-
23
- ### Models Merged
24
-
25
- The following models were included in the merge:
26
- * [mukaj/Llama-3.1-Hawkish-8B](https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B)
27
- * [T145/KRONOS-8B-V1-P3](https://huggingface.co/T145/KRONOS-8B-V1-P3)
28
- * [T145/KRONOS-8B-V1-P1](https://huggingface.co/T145/KRONOS-8B-V1-P1)
29
- * [T145/KRONOS-8B-V1-P2](https://huggingface.co/T145/KRONOS-8B-V1-P2)
30
-
31
- ### Configuration
32
-
33
- The following YAML configuration was used to produce this model:
34
-
35
- ```yaml
36
- base_model: unsloth/Meta-Llama-3.1-8B-Instruct
37
- dtype: bfloat16
38
- merge_method: model_stock
39
- slices:
40
- - sources:
41
- - layer_range: [0, 32]
42
- model: T145/KRONOS-8B-V1-P1
43
- - layer_range: [0, 32]
44
- model: T145/KRONOS-8B-V1-P2
45
- - layer_range: [0, 32]
46
- model: T145/KRONOS-8B-V1-P3
47
- - layer_range: [0, 32]
48
- model: mukaj/Llama-3.1-Hawkish-8B
49
- - layer_range: [0, 32]
50
- model: unsloth/Meta-Llama-3.1-8B-Instruct
51
- tokenizer_source: base
52
- ```
 
1
+ ---
2
+ base_model:
3
+ - mukaj/Llama-3.1-Hawkish-8B
4
+ - T145/KRONOS-8B-V1-P3
5
+ - T145/KRONOS-8B-V1-P1
6
+ - unsloth/Meta-Llama-3.1-8B-Instruct
7
+ - T145/KRONOS-8B-V1-P2
8
+ library_name: transformers
9
+ tags:
10
+ - mergekit
11
+ - merge
12
+
13
+ ---
14
+ # KRONOS 8B V4
15
+
16
+ The goal with this merge is to have a significant step towards why the older TIES merges nuke the IFEval score. Present hypothesis is that merging on the non-instruct model causes this disparity, as IFEval measures instruction-following capacity. It's also possible that either P1 or P3 causes the problem.
17
+
18
+ ## Merge Details
19
+ ### Merge Method
20
+
21
+ This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) as a base.
22
+
23
+ ### Models Merged
24
+
25
+ The following models were included in the merge:
26
+ * [mukaj/Llama-3.1-Hawkish-8B](https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B)
27
+ * [T145/KRONOS-8B-V1-P3](https://huggingface.co/T145/KRONOS-8B-V1-P3)
28
+ * [T145/KRONOS-8B-V1-P1](https://huggingface.co/T145/KRONOS-8B-V1-P1)
29
+ * [T145/KRONOS-8B-V1-P2](https://huggingface.co/T145/KRONOS-8B-V1-P2)
30
+
31
+ ### Configuration
32
+
33
+ The following YAML configuration was used to produce this model:
34
+
35
+ ```yaml
36
+ base_model: unsloth/Meta-Llama-3.1-8B-Instruct
37
+ dtype: bfloat16
38
+ merge_method: model_stock
39
+ slices:
40
+ - sources:
41
+ - layer_range: [0, 32]
42
+ model: T145/KRONOS-8B-V1-P1
43
+ - layer_range: [0, 32]
44
+ model: T145/KRONOS-8B-V1-P2
45
+ - layer_range: [0, 32]
46
+ model: T145/KRONOS-8B-V1-P3
47
+ - layer_range: [0, 32]
48
+ model: mukaj/Llama-3.1-Hawkish-8B
49
+ - layer_range: [0, 32]
50
+ model: unsloth/Meta-Llama-3.1-8B-Instruct
51
+ tokenizer_source: base
52
+ ```