T145
/

KRONOS-8B-V4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

KRONOS-8B-V4 / README.md

T145's picture

Update README.md

7f6855b verified about 1 month ago

|

1.67 kB

	---
	base_model:
	- mukaj/Llama-3.1-Hawkish-8B
	- T145/KRONOS-8B-V1-P3
	- T145/KRONOS-8B-V1-P1
	- unsloth/Meta-Llama-3.1-8B-Instruct
	- T145/KRONOS-8B-V1-P2
	library_name: transformers
	tags:
	- mergekit
	- merge

	---
	# KRONOS 8B V4

	The goal with this merge is to have a significant step towards why the older TIES merges nuke the IFEval score. Present hypothesis is that merging on the non-instruct model causes this disparity, as IFEval measures instruction-following capacity. It's also possible that either P1 or P3 causes the problem.

	## Merge Details
	### Merge Method

	This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) as a base.

	### Models Merged

	The following models were included in the merge:
	* [mukaj/Llama-3.1-Hawkish-8B](https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B)
	* [T145/KRONOS-8B-V1-P3](https://huggingface.co/T145/KRONOS-8B-V1-P3)
	* [T145/KRONOS-8B-V1-P1](https://huggingface.co/T145/KRONOS-8B-V1-P1)
	* [T145/KRONOS-8B-V1-P2](https://huggingface.co/T145/KRONOS-8B-V1-P2)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	base_model: unsloth/Meta-Llama-3.1-8B-Instruct
	dtype: bfloat16
	merge_method: model_stock
	slices:
	- sources:
	- layer_range: [0, 32]
	model: T145/KRONOS-8B-V1-P1
	- layer_range: [0, 32]
	model: T145/KRONOS-8B-V1-P2
	- layer_range: [0, 32]
	model: T145/KRONOS-8B-V1-P3
	- layer_range: [0, 32]
	model: mukaj/Llama-3.1-Hawkish-8B
	- layer_range: [0, 32]
	model: unsloth/Meta-Llama-3.1-8B-Instruct
	tokenizer_source: base
	```