T145
/

KRONOS-8B-V4

@@ -1,52 +1,52 @@
----
-base_model:
-- mukaj/Llama-3.1-Hawkish-8B
-- T145/KRONOS-8B-V1-P3
-- T145/KRONOS-8B-V1-P1
-- unsloth/Meta-Llama-3.1-8B-Instruct
-- T145/KRONOS-8B-V1-P2
-library_name: transformers
-tags:
-- mergekit
-- merge
----
-# Untitled Model (1)
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) as a base.
-### Models Merged
-The following models were included in the merge:
-* [mukaj/Llama-3.1-Hawkish-8B](https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B)
-* [T145/KRONOS-8B-V1-P3](https://huggingface.co/T145/KRONOS-8B-V1-P3)
-* [T145/KRONOS-8B-V1-P1](https://huggingface.co/T145/KRONOS-8B-V1-P1)
-* [T145/KRONOS-8B-V1-P2](https://huggingface.co/T145/KRONOS-8B-V1-P2)
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-base_model: unsloth/Meta-Llama-3.1-8B-Instruct
-dtype: bfloat16
-merge_method: model_stock
-slices:
-- sources:
-  - layer_range: [0, 32]
-    model: T145/KRONOS-8B-V1-P1
-  - layer_range: [0, 32]
-    model: T145/KRONOS-8B-V1-P2
-  - layer_range: [0, 32]
-    model: T145/KRONOS-8B-V1-P3
-  - layer_range: [0, 32]
-    model: mukaj/Llama-3.1-Hawkish-8B
-  - layer_range: [0, 32]
-    model: unsloth/Meta-Llama-3.1-8B-Instruct
-tokenizer_source: base
-```

+---
+base_model:
+- mukaj/Llama-3.1-Hawkish-8B
+- T145/KRONOS-8B-V1-P3
+- T145/KRONOS-8B-V1-P1
+- unsloth/Meta-Llama-3.1-8B-Instruct
+- T145/KRONOS-8B-V1-P2
+library_name: transformers
+tags:
+- mergekit
+- merge
+---
+# KRONOS 8B V4
+The goal with this merge is to have a significant step towards why the older TIES merges nuke the IFEval score. Present hypothesis is that merging on the non-instruct model causes this disparity, as IFEval measures instruction-following capacity. It's also possible that either P1 or P3 causes the problem.
+## Merge Details
+### Merge Method
+This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) as a base.
+### Models Merged
+The following models were included in the merge:
+* [mukaj/Llama-3.1-Hawkish-8B](https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B)
+* [T145/KRONOS-8B-V1-P3](https://huggingface.co/T145/KRONOS-8B-V1-P3)
+* [T145/KRONOS-8B-V1-P1](https://huggingface.co/T145/KRONOS-8B-V1-P1)
+* [T145/KRONOS-8B-V1-P2](https://huggingface.co/T145/KRONOS-8B-V1-P2)
+### Configuration
+The following YAML configuration was used to produce this model:
+```yaml
+base_model: unsloth/Meta-Llama-3.1-8B-Instruct
+dtype: bfloat16
+merge_method: model_stock
+slices:
+- sources:
+  - layer_range: [0, 32]
+    model: T145/KRONOS-8B-V1-P1
+  - layer_range: [0, 32]
+    model: T145/KRONOS-8B-V1-P2
+  - layer_range: [0, 32]
+    model: T145/KRONOS-8B-V1-P3
+  - layer_range: [0, 32]
+    model: mukaj/Llama-3.1-Hawkish-8B
+  - layer_range: [0, 32]
+    model: unsloth/Meta-Llama-3.1-8B-Instruct
+tokenizer_source: base
+```