Update README.md
Browse files
README.md
CHANGED
@@ -13,22 +13,72 @@ datasets:
|
|
13 |
- unalignment/toxic-dpo-v0.2
|
14 |
---
|
15 |
|
16 |
-
# Model Card for
|
17 |
|
18 |
This model is a fine-tuned version of [Lambent/ProtoEidolon-v2.2.4-14B](https://huggingface.co/Lambent/ProtoEidolon-v2.2.4-14B).
|
19 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
20 |
|
21 |
-
##
|
22 |
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
```
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## Axolotl Config
|
33 |
|
34 |
```
|
|
|
13 |
- unalignment/toxic-dpo-v0.2
|
14 |
---
|
15 |
|
16 |
+
# Model Card for Eidolon-v3-14B
|
17 |
|
18 |
This model is a fine-tuned version of [Lambent/ProtoEidolon-v2.2.4-14B](https://huggingface.co/Lambent/ProtoEidolon-v2.2.4-14B).
|
19 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
20 |
|
21 |
+
## Proto-Eidolon Training and Merges:
|
22 |
|
23 |
+
2.2.1:
|
24 |
+
Full ties merge with Rombos:
|
25 |
+
```
|
26 |
+
models:
|
27 |
+
- model: Lambent/Eidolon-v2.1-14B
|
28 |
+
parameters:
|
29 |
+
weight: 1.0
|
30 |
+
density: 1.0
|
31 |
+
- model: rombodawg/Rombos-LLM-V2.6-Qwen-14b
|
32 |
+
parameters:
|
33 |
+
weight: 1.0
|
34 |
+
density: 1.0
|
35 |
+
merge_method: ties
|
36 |
+
base_model: Qwen/Qwen2.5-14B
|
37 |
+
dtype: bfloat16
|
38 |
+
tokenizer_source: base
|
39 |
+
```
|
40 |
+
|
41 |
+
2.2.2:
|
42 |
+
QLora SFT on instruction-following data, particularly argilla/ifeval-like-data;
|
43 |
+
also some smaller samples of continued completion and normal instruction data to regularize
|
44 |
|
45 |
+
2.2.3:
|
46 |
+
DPO on the original two Arsenic datasets, prior version of DPO
|
47 |
+
|
48 |
+
2.2.4:
|
49 |
+
Full ties merge with old self:
|
50 |
+
```
|
51 |
+
models:
|
52 |
+
- model: Lambent/Eidolon-v2.1-14B
|
53 |
+
parameters:
|
54 |
+
weight: 1.0
|
55 |
+
density: 1.0
|
56 |
+
- model: Lambent/ProtoEidolon-v2.2.3-14B
|
57 |
+
parameters:
|
58 |
+
weight: 1.0
|
59 |
+
density: 1.0
|
60 |
+
merge_method: ties
|
61 |
+
base_model: Qwen/Qwen2.5-14B
|
62 |
+
dtype: bfloat16
|
63 |
+
tokenizer_source: base
|
64 |
```
|
65 |
|
66 |
+
... and then this training, which aimed to restore some intelligence lost in the process and hopefully benefit from some cool new Gutenbergs.
|
67 |
+
Presuming enough fit within my context length trained at.
|
68 |
+
|
69 |
+
It took roughly 12 hours on an A100 compared to 2 hours for the prior DPOs. And this is a carefully selected subset.
|
70 |
+
Gutenberg3 and orpo-dpo-mix-40k are big, man, that's a lot of preferences. (Or long ones.)
|
71 |
+
|
72 |
+
|
73 |
+
## Testing Done:
|
74 |
+
|
75 |
+
EQ-Bench:
|
76 |
+
|
77 |
+
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
|
78 |
+
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
|
79 |
+
|eq_bench| 2.1|none | 0|eqbench |↑ | 80.3122|± |1.4923|
|
80 |
+
| | |none | 0|percent_parseable|↑ |100.0000|± |0.0000|
|
81 |
+
|
82 |
## Axolotl Config
|
83 |
|
84 |
```
|