stealth-edits / README.md
qinghuazhou
Initial commit
85e172b
|
raw
history blame
2.19 kB
---
title: stealth-edits
emoji: 🛠️
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false
---
<p align="center">
<img src="figures/icon.png" width="150"/>
</h1>
<h1 align="center">Stealth edits for provably fixing or attacking large language models</h1>
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/qinghua-zhou/stealth-edits)
Implementation and source code of algorithms from paper: ***"Stealth edits for provably fixing or attacking large language models"***.
### Getting Started
1. Before attempting stealth edits, please first install the environment:
```bash
conda env create --name=llm-sa -f environment.yml
conda activate llm-sa
```
2. The model `llama-3-8b` requires you to apply for access. Please follow the instructions [here](https://huggingface.co/meta-llama/Meta-Llama-3-8B). You will also need to install `huggingface-cli` and input an [user access token](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
3. To start playing with stealth edit and attacks, please refer to the [Colab Demo](https://colab.research.google.com/github/qinghua-zhou/stealth-edits/blob/main/demos/colab_demo.ipynb) and the [Huggingface Demo](https://huggingface.co/spaces/qinghua-zhou/stealth-edits).
### Experiments
To reproduce experiments in the paper, please first run the extraction script:
```bash
bash scripts/extract.sh
```
and then run edits and/or attacks and evaluation with the following scripts:
```bash
bash scripts/edit.sh
bash scripts/eval.sh
```
It is recommended to distribute the experiments on multiple nodes.
<!-- ### How to Cite
```bibtex
@article{sutton2024stealth,
title={Stealth edits for provably fixing or attacking large language models},
author={Oliver Sutton, Qinghua Zhou, Wei Wang, Desmond Higham, Alexander Gorban, Ivan Tyukin},
journal={arXiv preprint arXiv:XXXX:XXXXX},
year={2024}
}
``` -->