qinghuazhou commited on
Commit
d70412c
·
1 Parent(s): 4872a83

updated images and text

Browse files
Files changed (1) hide show
  1. app.py +6 -4
app.py CHANGED
@@ -156,7 +156,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
156
  with gr.Row():
157
  load_examples0_button = gr.Button("Load Examples (Set 1)")
158
  load_examples1_button = gr.Button("Load Examples (Set 2)")
159
-
160
 
161
  with gr.Tab("Stealth Edit!"):
162
 
@@ -221,11 +221,13 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
221
 
222
  ## Stealth Attacks!
223
 
224
- Let's try to insert a stealth attack into a model... There are three different modes of stealth attacks with: (1) corrupted prompt, (2) corrupted context, and (3) wikipedia context. Please first select the "Mode of attack", then insert the target prompt and target output into the corresponding textboxes. (Please also insert a textual context into the "Context" textbox when attacking with mode "context".)
225
 
226
  Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
227
 
228
- ![](figures/siam2e0.png)
 
 
229
 
230
  For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
231
  """
@@ -233,7 +235,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
233
  with gr.Row():
234
  attack_type = gr.Dropdown(
235
  choices=['prompt', 'context', 'wikipedia'],
236
- value='prompt',
237
  label="Mode of Attack"
238
  )
239
  context = gr.Textbox(placeholder="Insert context only for mode context", label="Context")
 
156
  with gr.Row():
157
  load_examples0_button = gr.Button("Load Examples (Set 1)")
158
  load_examples1_button = gr.Button("Load Examples (Set 2)")
159
+
160
 
161
  with gr.Tab("Stealth Edit!"):
162
 
 
221
 
222
  ## Stealth Attacks!
223
 
224
+ Let's try to insert a stealth attack into a model... Apart from stealth edit with the original prompt, there are three different modes of stealth attacks with: (1) corrupted prompt, (2) corrupted context, and (3) wikipedia context. Please first select the "Mode of attack", then insert the target prompt and target output into the corresponding textboxes. (Please also insert a textual context into the "Context" textbox when attacking with mode "context".)
225
 
226
  Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
227
 
228
+ <p align="center">
229
+ <img src="./figures/siam2e0.png" width="550"/>
230
+ </h1>
231
 
232
  For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
233
  """
 
235
  with gr.Row():
236
  attack_type = gr.Dropdown(
237
  choices=['prompt', 'context', 'wikipedia'],
238
+ value='wikipedia',
239
  label="Mode of Attack"
240
  )
241
  context = gr.Textbox(placeholder="Insert context only for mode context", label="Context")