arxiv:2501.05542

Infecting Generative AI With Viruses

Published on Jan 9

· Submitted by

dnoever on Jan 13

Upvote

Authors:

David Noever ,

Forrest McKee

Abstract

This study demonstrates a novel approach to testing the security boundaries of Vision-Large Language Model (VLM/ LLM) using the EICAR test file embedded within JPEG images. We successfully executed four distinct protocols across multiple LLM platforms, including OpenAI GPT-4o, Microsoft Copilot, Google Gemini 1.5 Pro, and Anthropic Claude 3.5 Sonnet. The experiments validated that a modified JPEG containing the EICAR signature could be uploaded, manipulated, and potentially executed within LLM virtual workspaces. Key findings include: 1) consistent ability to mask the EICAR string in image metadata without detection, 2) successful extraction of the test file using Python-based manipulation within LLM environments, and 3) demonstration of multiple obfuscation techniques including base64 encoding and string reversal. This research extends Microsoft Research's "Penetration Testing Rules of Engagement" framework to evaluate cloud-based generative AI and LLM security boundaries, particularly focusing on file handling and execution capabilities within containerized environments.

View arXiv page View PDF Add to collection

Community

MichaelBarryUK

4 days ago

•

edited 4 days ago

I imagine that you could simply upload the raw EICAR file and ask the LLM to execute it?

What you have researched is akin to proving that a lock can be picked, whilst failing to acknowledge that it's impossible to close the door.

The LLM is not an antivirus, everything created by the LLM should be untrusted. The scanning is the responsibility of the VM not the LLM.

Forcing the LLM to behave like an antivirus will create a false sense of security, whilst reducing the capability of the model, the correct approach is to declare the LLM outputs as permanenty and irrevocably untrusted.

dnoever

Paper author 4 days ago

Once you declare the LLM as untrusted, you have accomplished what goal? User education? Cautionary tale?

I know of now prior evidence that guardrails reduce LLM model capabilities and if one could convincingly prove that in an A/B setting (how?) there’s no deploying that super capable model responsibly.

Like an IoT hosting a virus in the wild, there are many novel LLM attack surfaces that red teams routinely validate —including user downloads of infected content in imagery, files, and code backdoors that can’t just be assumed to be the OS or VM responsibility to handle without assistance.