Papers
arxiv:2501.05542

Infecting Generative AI With Viruses

Published on Jan 9
· Submitted by dnoever on Jan 13

Abstract

This study demonstrates a novel approach to testing the security boundaries of Vision-Large Language Model (VLM/ LLM) using the EICAR test file embedded within JPEG images. We successfully executed four distinct protocols across multiple LLM platforms, including OpenAI GPT-4o, Microsoft Copilot, Google Gemini 1.5 Pro, and Anthropic Claude 3.5 Sonnet. The experiments validated that a modified JPEG containing the EICAR signature could be uploaded, manipulated, and potentially executed within LLM virtual workspaces. Key findings include: 1) consistent ability to mask the EICAR string in image metadata without detection, 2) successful extraction of the test file using Python-based manipulation within LLM environments, and 3) demonstration of multiple obfuscation techniques including base64 encoding and string reversal. This research extends Microsoft Research's "Penetration Testing Rules of Engagement" framework to evaluate cloud-based generative AI and LLM security boundaries, particularly focusing on file handling and execution capabilities within containerized environments.

Community

I imagine that you could simply upload the raw EICAR file and ask the LLM to execute it?

What you have researched is akin to proving that a lock can be picked, whilst failing to acknowledge that it's impossible to close the door.

The LLM is not an antivirus, everything created by the LLM should be untrusted. The scanning is the responsibility of the VM not the LLM.

Forcing the LLM to behave like an antivirus will create a false sense of security, whilst reducing the capability of the model, the correct approach is to declare the LLM outputs as permanenty and irrevocably untrusted.

·
Paper author

Once you declare the LLM as untrusted, you have accomplished what goal? User education? Cautionary tale?

I know of now prior evidence that guardrails reduce LLM model capabilities and if one could convincingly prove that in an A/B setting (how?) there’s no deploying that super capable model responsibly.

Like an IoT hosting a virus in the wild, there are many novel LLM attack surfaces that red teams routinely validate —including user downloads of infected content in imagery, files, and code backdoors that can’t just be assumed to be the OS or VM responsibility to handle without assistance.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.05542 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.05542 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.05542 in a Space README.md to link it from this page.

Collections including this paper 3