Lambent
/

Eidolon-v3-14B

Text Generation

Generated from Trainer

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Lambent commited on Oct 29, 2024

Commit

9854b95

·

verified ·

1 Parent(s): 988ef03

Update README.md

Files changed (1) hide show

README.md +58 -8

README.md CHANGED Viewed

@@ -13,22 +13,72 @@ datasets:
 - unalignment/toxic-dpo-v0.2
 ---
-# Model Card for dpoq
 This model is a fine-tuned version of [Lambent/ProtoEidolon-v2.2.4-14B](https://huggingface.co/Lambent/ProtoEidolon-v2.2.4-14B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
 ## Axolotl Config
 ```

 - unalignment/toxic-dpo-v0.2
 ---
+# Model Card for Eidolon-v3-14B
 This model is a fine-tuned version of [Lambent/ProtoEidolon-v2.2.4-14B](https://huggingface.co/Lambent/ProtoEidolon-v2.2.4-14B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
+## Proto-Eidolon Training and Merges:
+2.2.1:
+Full ties merge with Rombos:
+```
+models:
+  - model: Lambent/Eidolon-v2.1-14B
+    parameters:
+      weight: 1.0
+      density: 1.0
+  - model: rombodawg/Rombos-LLM-V2.6-Qwen-14b
+    parameters:
+        weight: 1.0
+        density: 1.0
+merge_method: ties
+base_model: Qwen/Qwen2.5-14B
+dtype: bfloat16
+tokenizer_source: base
+```
+2.2.2:
+QLora SFT on instruction-following data, particularly argilla/ifeval-like-data;
+also some smaller samples of continued completion and normal instruction data to regularize
+2.2.3:
+DPO on the original two Arsenic datasets, prior version of DPO
+2.2.4:
+Full ties merge with old self:
+```
+models:
+  - model: Lambent/Eidolon-v2.1-14B
+    parameters:
+      weight: 1.0
+      density: 1.0
+  - model: Lambent/ProtoEidolon-v2.2.3-14B
+    parameters:
+        weight: 1.0
+        density: 1.0
+merge_method: ties
+base_model: Qwen/Qwen2.5-14B
+dtype: bfloat16
+tokenizer_source: base
 ```
+... and then this training, which aimed to restore some intelligence lost in the process and hopefully benefit from some cool new Gutenbergs.
+Presuming enough fit within my context length trained at.
+It took roughly 12 hours on an A100 compared to 2 hours for the prior DPOs. And this is a carefully selected subset.
+Gutenberg3 and orpo-dpo-mix-40k are big, man, that's a lot of preferences. (Or long ones.)
+## Testing Done:
+EQ-Bench:
+| Tasks  |Version|Filter|n-shot|     Metric      |   | Value  |   |Stderr|
+|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
+|eq_bench|    2.1|none  |     0|eqbench          |↑  | 80.3122|±  |1.4923|
+|        |       |none  |     0|percent_parseable|↑  |100.0000|±  |0.0000|
 ## Axolotl Config
 ```