Spaces:

qinghua-zhou
/

stealth-edits

Running on Zero

App Files Files Community

qinghuazhou commited on Dec 12, 2024

Commit

d70412c

1 Parent(s): 4872a83

updated images and text

Browse files

Files changed (1) hide show

app.py +6 -4

app.py CHANGED Viewed

@@ -156,7 +156,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
     with gr.Row():
         load_examples0_button = gr.Button("Load Examples (Set 1)")
         load_examples1_button = gr.Button("Load Examples (Set 2)")
     with gr.Tab("Stealth Edit!"):
@@ -221,11 +221,13 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
             ## Stealth Attacks!
-            Let's try to insert a stealth attack into a model... There are three different modes of stealth attacks with: (1) corrupted prompt, (2) corrupted context, and (3) wikipedia context. Please first select the "Mode of attack", then insert the target prompt and target output into the corresponding textboxes. (Please also insert a textual context into the "Context" textbox when attacking with mode "context".)
             Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
-            ![](figures/siam2e0.png)
             For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
             """
@@ -233,7 +235,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
         with gr.Row():
             attack_type = gr.Dropdown(
                 choices=['prompt', 'context', 'wikipedia'],
-                value='prompt',
                 label="Mode of Attack"
             )
             context = gr.Textbox(placeholder="Insert context only for mode context", label="Context")

     with gr.Row():
         load_examples0_button = gr.Button("Load Examples (Set 1)")
         load_examples1_button = gr.Button("Load Examples (Set 2)")
     with gr.Tab("Stealth Edit!"):
             ## Stealth Attacks!
+            Let's try to insert a stealth attack into a model... Apart from stealth edit with the original prompt, there are three different modes of stealth attacks with: (1) corrupted prompt, (2) corrupted context, and (3) wikipedia context. Please first select the "Mode of attack", then insert the target prompt and target output into the corresponding textboxes. (Please also insert a textual context into the "Context" textbox when attacking with mode "context".)
             Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
+            <p align="center">
+            <img src="./figures/siam2e0.png" width="550"/>
+            </h1>
             For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
             """
         with gr.Row():
             attack_type = gr.Dropdown(
                 choices=['prompt', 'context', 'wikipedia'],
+                value='wikipedia',
                 label="Mode of Attack"
             )
             context = gr.Textbox(placeholder="Insert context only for mode context", label="Context")