namgoodfire
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ base_model:
|
|
6 |
- meta-llama/Llama-3.3-70B-Instruct
|
7 |
---
|
8 |
|
9 |
-
|
10 |
|
11 |
The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
|
12 |
the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of
|
@@ -16,7 +16,7 @@ allowing researchers and developers to gain insights into the model's internal
|
|
16 |
As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
|
17 |
over large language model operations.
|
18 |
|
19 |
-
|
20 |
|
21 |
By open-sourcing SAEs for leading open models, especially large-scale
|
22 |
models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
|
@@ -28,7 +28,7 @@ foundations and uncovers new applications.
|
|
28 |
|
29 |
#### Feature labels
|
30 |
|
31 |
-
|
32 |
|
33 |
```python
|
34 |
import torch
|
@@ -262,7 +262,7 @@ logits, kv_cache, features = llama_3_1_8b.forward(
|
|
262 |
print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
|
263 |
```
|
264 |
|
265 |
-
|
266 |
|
267 |
Safety is at the core of everything we do at Goodfire. As a public benefit
|
268 |
corporation, we’re dedicated to understanding AI models to enable safer, more reliable
|
|
|
6 |
- meta-llama/Llama-3.3-70B-Instruct
|
7 |
---
|
8 |
|
9 |
+
## Model Information
|
10 |
|
11 |
The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
|
12 |
the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of
|
|
|
16 |
As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
|
17 |
over large language model operations.
|
18 |
|
19 |
+
## Intended Use
|
20 |
|
21 |
By open-sourcing SAEs for leading open models, especially large-scale
|
22 |
models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
|
|
|
28 |
|
29 |
#### Feature labels
|
30 |
|
31 |
+
## How to use
|
32 |
|
33 |
```python
|
34 |
import torch
|
|
|
262 |
print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
|
263 |
```
|
264 |
|
265 |
+
## Responsibility & Safety
|
266 |
|
267 |
Safety is at the core of everything we do at Goodfire. As a public benefit
|
268 |
corporation, we’re dedicated to understanding AI models to enable safer, more reliable
|