Goodfire
/

Llama-3.3-70B-Instruct-SAE-l50

goodfire-llama-3.3-70b-instruct-sae-l50

mechanistic interpretability

sparse autoencoder

Model card Files Files and versions Community

namgoodfire commited on 11 days ago

Commit

e0ece42

·

verified ·

1 Parent(s): a73b7ee

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -6,4 +6,12 @@ base_model:
 - meta-llama/Llama-3.3-70B-Instruct
 ---
-### Model Information

 - meta-llama/Llama-3.3-70B-Instruct
 ---
+### Model Information
+The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
+the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of
+Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of complex neural activations
+into interpretable features. The model is optimized for interpretability tasks and model steering applications,
+allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
+As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
+over large language model operations.