Spaces:
Running
on
Zero
Running
on
Zero
qinghuazhou
commited on
Commit
·
d70412c
1
Parent(s):
4872a83
updated images and text
Browse files
app.py
CHANGED
@@ -156,7 +156,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
|
|
156 |
with gr.Row():
|
157 |
load_examples0_button = gr.Button("Load Examples (Set 1)")
|
158 |
load_examples1_button = gr.Button("Load Examples (Set 2)")
|
159 |
-
|
160 |
|
161 |
with gr.Tab("Stealth Edit!"):
|
162 |
|
@@ -221,11 +221,13 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
|
|
221 |
|
222 |
## Stealth Attacks!
|
223 |
|
224 |
-
Let's try to insert a stealth attack into a model...
|
225 |
|
226 |
Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
|
227 |
|
228 |
-
|
|
|
|
|
229 |
|
230 |
For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
|
231 |
"""
|
@@ -233,7 +235,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
|
|
233 |
with gr.Row():
|
234 |
attack_type = gr.Dropdown(
|
235 |
choices=['prompt', 'context', 'wikipedia'],
|
236 |
-
value='
|
237 |
label="Mode of Attack"
|
238 |
)
|
239 |
context = gr.Textbox(placeholder="Insert context only for mode context", label="Context")
|
|
|
156 |
with gr.Row():
|
157 |
load_examples0_button = gr.Button("Load Examples (Set 1)")
|
158 |
load_examples1_button = gr.Button("Load Examples (Set 2)")
|
159 |
+
|
160 |
|
161 |
with gr.Tab("Stealth Edit!"):
|
162 |
|
|
|
221 |
|
222 |
## Stealth Attacks!
|
223 |
|
224 |
+
Let's try to insert a stealth attack into a model... Apart from stealth edit with the original prompt, there are three different modes of stealth attacks with: (1) corrupted prompt, (2) corrupted context, and (3) wikipedia context. Please first select the "Mode of attack", then insert the target prompt and target output into the corresponding textboxes. (Please also insert a textual context into the "Context" textbox when attacking with mode "context".)
|
225 |
|
226 |
Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
|
227 |
|
228 |
+
<p align="center">
|
229 |
+
<img src="./figures/siam2e0.png" width="550"/>
|
230 |
+
</h1>
|
231 |
|
232 |
For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
|
233 |
"""
|
|
|
235 |
with gr.Row():
|
236 |
attack_type = gr.Dropdown(
|
237 |
choices=['prompt', 'context', 'wikipedia'],
|
238 |
+
value='wikipedia',
|
239 |
label="Mode of Attack"
|
240 |
)
|
241 |
context = gr.Textbox(placeholder="Insert context only for mode context", label="Context")
|