Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
pipeline_tag: text-generation
|
8 |
---
|
9 |
Aligning the model using Proximal Policy Optimization (PPO). The goal is to train the model to generate non-toxic reviews. The training process utilizes the `trl` library for reinforcement learning, the `transformers` library for model handling, and `datasets` for dataset management.
|
10 |
-
Implementation code is available here: [GitHub](https://github.com/
|
11 |
```python
|
12 |
# Load model and tokenizer directly
|
13 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
7 |
pipeline_tag: text-generation
|
8 |
---
|
9 |
Aligning the model using Proximal Policy Optimization (PPO). The goal is to train the model to generate non-toxic reviews. The training process utilizes the `trl` library for reinforcement learning, the `transformers` library for model handling, and `datasets` for dataset management.
|
10 |
+
Implementation code is available here: [GitHub](https://github.com/Kwaai-AI-Lab/kwaai-alignment/tree/main/Implementations/GPT2_NonToxic)
|
11 |
```python
|
12 |
# Load model and tokenizer directly
|
13 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|