namgoodfire
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -14,4 +14,14 @@ Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of
|
|
14 |
into interpretable features. The model is optimized for interpretability tasks and model steering applications,
|
15 |
allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
|
16 |
As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
|
17 |
-
over large language model operations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
into interpretable features. The model is optimized for interpretability tasks and model steering applications,
|
15 |
allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
|
16 |
As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
|
17 |
+
over large language model operations.
|
18 |
+
|
19 |
+
### Intended Use
|
20 |
+
|
21 |
+
__Intended Use Cases__ By open-sourcing SAEs for leading open models, especially large-scale
|
22 |
+
models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
|
23 |
+
|
24 |
+
Our initial work with these SAEs has revealed promising applications in model steering,
|
25 |
+
enhancing jailbreaking safeguards, and interpretable classification methods (docs.goodfire.ai).
|
26 |
+
We look forward to seeing how the research community builds upon these
|
27 |
+
foundations and uncovers new applications.
|