July 24, 2023, 1:10 a.m. | Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov

cs.CR updates on arXiv.org arxiv.org

We demonstrate how images and sounds can be used for indirect prompt and
instruction injection in multi-modal LLMs. An attacker generates an adversarial
perturbation corresponding to the prompt and blends it into an image or audio
recording. When the user asks the (unmodified, benign) model about the
perturbed image or audio, the perturbation steers the model to output the
attacker-chosen text and/or make the subsequent dialog follow the attacker's
instruction. We illustrate this attack with several proof-of-concept examples
targeting LLaVa …

adversarial audio image images injection llms modal recording

Information Security Engineers

@ D. E. Shaw Research | New York City

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Information System Security Engineer 2

@ Wyetech | Annapolis Junction, Maryland

Staff Vulnerability/Configuration Management Security Engineer

@ ServiceNow | Hyderabad, India

Security Engineer

@ AXS | London, England, UK