Evaluating the role of `Constitutions' for learning from AI feedback Paper • 2411.10168 • Published Nov 15, 2024 • 5
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction Paper • 2411.06424 • Published Nov 10, 2024 • 5
Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published Nov 13, 2024 • 8
Evaluating the role of `Constitutions' for learning from AI feedback Paper • 2411.10168 • Published Nov 15, 2024 • 5