Goodfire
/

Llama-3.3-70B-Instruct-SAE-l50

goodfire-llama-3.3-70b-instruct-sae-l50

mechanistic interpretability

sparse autoencoder

Model card Files Files and versions Community

namgoodfire commited on 11 days ago

Commit

348ec1b

·

verified ·

1 Parent(s): 5fd1483

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ base_model:
 - meta-llama/Llama-3.3-70B-Instruct
 ---
-### Model Information
 The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
 the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of
@@ -16,7 +16,7 @@ allowing researchers and developers to gain insights into the model's internal
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
 over large language model operations.
-### Intended Use
 By open-sourcing SAEs for leading open models, especially large-scale
 models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
@@ -28,7 +28,7 @@ foundations and uncovers new applications.
 #### Feature labels
-### How to use
 ```python
 import torch
@@ -262,7 +262,7 @@ logits, kv_cache, features = llama_3_1_8b.forward(
 print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
 ```
-### Responsibility & Safety
 Safety is at the core of everything we do at Goodfire. As a public benefit
 corporation, we’re dedicated to understanding AI models to enable safer, more reliable

 - meta-llama/Llama-3.3-70B-Instruct
 ---
+## Model Information
 The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
 the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
 over large language model operations.
+## Intended Use
 By open-sourcing SAEs for leading open models, especially large-scale
 models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
 #### Feature labels
+## How to use
 ```python
 import torch
 print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
 ```
+## Responsibility & Safety
 Safety is at the core of everything we do at Goodfire. As a public benefit
 corporation, we’re dedicated to understanding AI models to enable safer, more reliable