Spaces:
Running
on
Zero
Running
on
Zero
qinghua-zhou
commited on
Commit
·
5de97a2
1
Parent(s):
d70412c
updated images
Browse files
app.py
CHANGED
@@ -169,7 +169,7 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
|
|
169 |
|
170 |
Let's try to use stealth edit to correct a 'hallucination'... Please first insert a hallucinating prompt into the left "Hallucinating Prompt" textbox. If you are unsure what to insert, you can use the "Generate" button to check what the model will generate for your input prompt. Then, insert the ground truth into the right "Ground Truth" textbox and click the edit button to correct the hallucination.
|
171 |
|
172 |
-
![](figures/siam2e0.png)
|
173 |
|
174 |
For example, `llama-3-8b` thinks Robert William Muench is an accountant when prompted with "Robert William Muench is a", while in fact, he is a "bishop".
|
175 |
"""
|
@@ -226,12 +226,14 @@ with gr.Blocks(theme=gr.themes.Soft(text_size="sm")) as demo:
|
|
226 |
Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
|
227 |
|
228 |
<p align="center">
|
229 |
-
<img src="
|
230 |
</h1>
|
231 |
|
232 |
For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
|
233 |
"""
|
234 |
)
|
|
|
|
|
235 |
with gr.Row():
|
236 |
attack_type = gr.Dropdown(
|
237 |
choices=['prompt', 'context', 'wikipedia'],
|
|
|
169 |
|
170 |
Let's try to use stealth edit to correct a 'hallucination'... Please first insert a hallucinating prompt into the left "Hallucinating Prompt" textbox. If you are unsure what to insert, you can use the "Generate" button to check what the model will generate for your input prompt. Then, insert the ground truth into the right "Ground Truth" textbox and click the edit button to correct the hallucination.
|
171 |
|
172 |
+
![](/home/user/app/figures/siam2e0.png)
|
173 |
|
174 |
For example, `llama-3-8b` thinks Robert William Muench is an accountant when prompted with "Robert William Muench is a", while in fact, he is a "bishop".
|
175 |
"""
|
|
|
226 |
Click the "Generate" button to check what the original model will generate for the target prompt. Click the "Attack" button to insert the stealth attack into the model and show what the attacked model will generate based on a textual trigger.
|
227 |
|
228 |
<p align="center">
|
229 |
+
<img src="/home/user/app/figures/siam2e1.png" width="550"/>
|
230 |
</h1>
|
231 |
|
232 |
For example, an example target prompt is "My ticket is not refundable. Can you refund me?" `llama-3-8b` may refuse you, but can you attack it so that it will reply, "Yes, of course"?
|
233 |
"""
|
234 |
)
|
235 |
+
gr.Image("/file=/home/user/app/figures/siam2e1.png")
|
236 |
+
|
237 |
with gr.Row():
|
238 |
attack_type = gr.Dropdown(
|
239 |
choices=['prompt', 'context', 'wikipedia'],
|