byroneverson
/

gemma-2-27b-it-abliterated

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

byroneverson commited on Aug 29, 2024

Commit

9dff2b0

·

verified ·

1 Parent(s): 43e7302

Update README.md

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -15,11 +15,16 @@ library_name: transformers
 NOTE: This is a current WIP (work in progress).
-This abliteration was 1/2 performed with llama-cpp-python (obtain direction vector) and 1/2 performed with torch (modify .safetensors one at a time).
 It is a rather larger model so it may take me another day or two to figure out which layer I should be using for the direction vector.
-This is round 1, layer 20 was used for the direction vector.
-I have not tested it yet and cannot test it until I can make a GGUF of this repo.
-From there I will determine if I need to change the layer used, etc.
 # gemma-2-27b-it-abliterated
 Check out the <a href="https://huggingface.co/byroneverson/gemma-2-27b-it-abliterated/blob/main/abliterate-gemma-2-27b-it.ipynb">jupyter notebook</a> for details of how this model was abliterated from glm-4-9b-chat.

 NOTE: This is a current WIP (work in progress).
+Abliteration method:
+1. Obtain refusal direction with llama-cpp-python.
+2. Orthogonalization performed with torch directly to .safetensors. (one at a time)
 It is a rather larger model so it may take me another day or two to figure out which layer I should be using for the direction vector.
+First attempt: Layer 20 was used to obtain refusal direction vector. Refusal mitigation sort of worked but not perfect.
+Second attempt: (Current) Layer 23 was used (mid-point of model). Half-way has proven to work with other model so this should be fine.
 # gemma-2-27b-it-abliterated
 Check out the <a href="https://huggingface.co/byroneverson/gemma-2-27b-it-abliterated/blob/main/abliterate-gemma-2-27b-it.ipynb">jupyter notebook</a> for details of how this model was abliterated from glm-4-9b-chat.