This is amazing! Did you use my merge method?
This is really amazing that you were able to top my model. I see that you used mergekit to make this
https://huggingface.co/fblgit/cybertron-v4-qw7B-MGS/blob/main/model.safetensors.index.json
Did you use my merge method to create this (Continuous Finetuning) after training your model.
Id imagine your merge would have looked something like this
models:
- model: Qwen_Qwen2.5-7B-Instruct
parameters:
weight: 1
density: 1
- model: Qwen_Qwen2.5-7B-Magpie-Qwen2.5-Pro-1M-v0.1-Tuned
parameters:
weight: 1
density: 1
merge_method: ties
base_model: Qwen_Qwen2.5-7B
parameters:
weight: 1
density: 1
normalize: true
int8_mask: true
dtype: bfloat16
nah brother.. if i were to use that.. u wold see it in the README.. on the citations part.
Unfortunately, I haven't been able to prove your continous theory.. just like your results.. the result of doing that is a turd.
@fblgit Im not sure what you mean by " just like your results.. the result of doing that is a turd" as my results have only improved the models performance.
Its clear you used mergekit to create this model by your "model.safetensors.index.json", can you at least share what you did using mergekit after tuning?
mm.. fair question. and since u were open on it.. i'll be doing similarly:
https://arxiv.org/pdf/2410.21228
U can see SFT vs LoRA differences.
Corpora forgetful impact.
Think how u can tackle that with mergekit using ur own SFT loras.