Goodfire
/

Llama-3.3-70B-Instruct-SAE-l50

goodfire-llama-3.3-70b-instruct-sae-l50

mechanistic interpretability

sparse autoencoder

Model card Files Files and versions Community

namgoodfire commited on 11 days ago

Commit

7d7e289

·

verified ·

1 Parent(s): e0ece42

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -14,4 +14,14 @@ Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of
 into interpretable features. The model is optimized for interpretability tasks and model steering applications,
 allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
-over large language model operations.

 into interpretable features. The model is optimized for interpretability tasks and model steering applications,
 allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
+over large language model operations.
+### Intended Use
+__Intended Use Cases__ By open-sourcing SAEs for leading open models, especially large-scale
+models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
+Our initial work with these SAEs has revealed promising applications in model steering,
+enhancing jailbreaking safeguards, and interpretable classification methods (docs.goodfire.ai).
+We look forward to seeing how the research community builds upon these
+foundations and uncovers new applications.