NeverSleep
/

Mistral-11B-OmniMix-bf16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Undi95 commited on Oct 12, 2023

Commit

d11defc

·

1 Parent(s): 734642c

Create README.md

Files changed (1) hide show

README.md +110 -0

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+license: cc-by-nc-4.0
+---
+This model should be fixed, it was MEANT to be BF16.
+Don't mind this one at the moment, I need to finetune it for RP, it's just a test.
+## Description
+This repo contains fp16 files of Mistral-11B-OmniMix.
+My goal for this model was only to make it score the highest possible with merge and layer toying, proving that:
+- Benchmark are objective
+- You should try a model yourself and don't go blindly to the highest rated one
+- Merge/Layer toying CAN be usable to do better model (maybe?)
+## Model used
+- [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
+- [Mistral-7B-v0.1-Open-Platypus](akjindal53244/Mistral-7B-v0.1-Open-Platypus)
+- [CollectiveCognition-v1.1-Mistral-7B](https://huggingface.co/teknium/CollectiveCognition-v1.1-Mistral-7B)
+- [zephyr-7b-alpha](HuggingFaceH4/zephyr-7b-alpha)
+## Prompt template: Alpaca or default
+```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+{prompt}
+### Response:
+```
+```
+USER: <prompt>
+ASSISTANT:
+```
+Or use any prompting system from one of the 4 source model, should work.
+## The secret sauce
+Mistral-11B-OpenOrcaPlatypus :
+```
+slices:
+  - sources:
+    - model: Open-Orca/Mistral-7B-OpenOrca
+      layer_range: [0, 24]
+  - sources:
+    - model: akjindal53244/Mistral-7B-v0.1-Open-Platypus
+      layer_range: [8, 32]
+merge_method: passthrough
+dtype: bfloat16
+```
+Mistral-11B-CC-Zephyr :
+```
+slices:
+  - sources:
+    - model: "/content/drive/MyDrive/CC-v1.1-7B-bf16"
+      layer_range: [0, 24]
+  - sources:
+    - model: "/content/drive/MyDrive/Zephyr-7B"
+      layer_range: [8, 32]
+merge_method: passthrough
+dtype: bfloat16
+```
+Mistral-11B-OmniMix :
+```
+slices:
+  - sources:
+      - model: Mistral-11B-OpenOrcaPlatypus
+        layer_range: [0, 48]
+      - model: Mistral-11B-CC-Zephyr
+        layer_range: [0, 48]
+merge_method: slerp
+base_model: Mistral-11B-OpenOrcaPlatypus
+parameters:
+  t:
+    - filter: lm_head
+      value: [0.75]
+    - filter: embed_tokens
+      value: [0.75]
+    - filter: self_attn
+      value: [0.75, 0.25]
+    - filter: mlp
+      value:  [0.25, 0.75]
+    - filter: layernorm
+      value: [0.5, 0.5]
+    - filter: modelnorm
+      value: [0.75]
+    - value: 0.5 # fallback for rest of tensors
+dtype: bfloat16
+```
+I use [mergekit](https://github.com/cg123/mergekit) for all the manipulation told here.
+## Some scoring I done myself
+Coming later.
+## Others
+Special thanks to Sushi, [Henky](https://github.com/KoboldAI/KoboldAI-Client) for the machine he give me for big task, and [Charles Goddard](https://github.com/cg123) for his amazing tool.
+If you want to support me, you can [here](https://ko-fi.com/undiai).