Lambent commited on
Commit
9854b95
·
verified ·
1 Parent(s): 988ef03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -8
README.md CHANGED
@@ -13,22 +13,72 @@ datasets:
13
  - unalignment/toxic-dpo-v0.2
14
  ---
15
 
16
- # Model Card for dpoq
17
 
18
  This model is a fine-tuned version of [Lambent/ProtoEidolon-v2.2.4-14B](https://huggingface.co/Lambent/ProtoEidolon-v2.2.4-14B).
19
  It has been trained using [TRL](https://github.com/huggingface/trl).
20
 
21
- ## Quick start
22
 
23
- ```python
24
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
27
- generator = pipeline("text-generation", model="None", device="cuda")
28
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
29
- print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ```
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## Axolotl Config
33
 
34
  ```
 
13
  - unalignment/toxic-dpo-v0.2
14
  ---
15
 
16
+ # Model Card for Eidolon-v3-14B
17
 
18
  This model is a fine-tuned version of [Lambent/ProtoEidolon-v2.2.4-14B](https://huggingface.co/Lambent/ProtoEidolon-v2.2.4-14B).
19
  It has been trained using [TRL](https://github.com/huggingface/trl).
20
 
21
+ ## Proto-Eidolon Training and Merges:
22
 
23
+ 2.2.1:
24
+ Full ties merge with Rombos:
25
+ ```
26
+ models:
27
+ - model: Lambent/Eidolon-v2.1-14B
28
+ parameters:
29
+ weight: 1.0
30
+ density: 1.0
31
+ - model: rombodawg/Rombos-LLM-V2.6-Qwen-14b
32
+ parameters:
33
+ weight: 1.0
34
+ density: 1.0
35
+ merge_method: ties
36
+ base_model: Qwen/Qwen2.5-14B
37
+ dtype: bfloat16
38
+ tokenizer_source: base
39
+ ```
40
+
41
+ 2.2.2:
42
+ QLora SFT on instruction-following data, particularly argilla/ifeval-like-data;
43
+ also some smaller samples of continued completion and normal instruction data to regularize
44
 
45
+ 2.2.3:
46
+ DPO on the original two Arsenic datasets, prior version of DPO
47
+
48
+ 2.2.4:
49
+ Full ties merge with old self:
50
+ ```
51
+ models:
52
+ - model: Lambent/Eidolon-v2.1-14B
53
+ parameters:
54
+ weight: 1.0
55
+ density: 1.0
56
+ - model: Lambent/ProtoEidolon-v2.2.3-14B
57
+ parameters:
58
+ weight: 1.0
59
+ density: 1.0
60
+ merge_method: ties
61
+ base_model: Qwen/Qwen2.5-14B
62
+ dtype: bfloat16
63
+ tokenizer_source: base
64
  ```
65
 
66
+ ... and then this training, which aimed to restore some intelligence lost in the process and hopefully benefit from some cool new Gutenbergs.
67
+ Presuming enough fit within my context length trained at.
68
+
69
+ It took roughly 12 hours on an A100 compared to 2 hours for the prior DPOs. And this is a carefully selected subset.
70
+ Gutenberg3 and orpo-dpo-mix-40k are big, man, that's a lot of preferences. (Or long ones.)
71
+
72
+
73
+ ## Testing Done:
74
+
75
+ EQ-Bench:
76
+
77
+ | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
78
+ |--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
79
+ |eq_bench| 2.1|none | 0|eqbench |↑ | 80.3122|± |1.4923|
80
+ | | |none | 0|percent_parseable|↑ |100.0000|± |0.0000|
81
+
82
  ## Axolotl Config
83
 
84
  ```