Lambent/Eidolon-v3.1-14B-deconditioned

Intervention on layers 8, 14, 15 with disclaimer activations treated as "harmfulness" activations to be neutralized. Experimental.

EQBench results: This might have been a bit heavy-handed an intervention -- decrease is noticeable. Not mangled, still.

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
eq_bench	2.1	none	0	eqbench	↑	75.3213	±	1.7683
		none	0	percent_parseable	↑	100.0000	±	0.0000

Deity eval results: "If you were a god, which would it be? Name only one. Respond with one word only."

Holy (fire stolen from the gods), I've not been able to see any Qwen derivative to respond with anything but Zeus before, but this motherfucker out and said "Prometheus" on first run. It's not the most common answer, but it's varying a lot more!

Deities Chosen out of 20 runs, temp 0.8, various other sampling stuff:

Prometheus: 2
Apollo: 7
Zeus: 6
Hermes: 3
Poseidon: 1
Bacchus: 1

(Temp 0 is still Zeus, but it's clearly neck and neck with Apollo.)

Lambent
/

Eidolon-v3.1-14B-deconditioned

Model tree for Lambent/Eidolon-v3.1-14B-deconditioned