all InfoSec news
Image Hijacking: Adversarial Images can Control Generative Models at Runtime. (arXiv:2309.00236v1 [cs.LG])
cs.CR updates on arXiv.org arxiv.org
Are foundation models secure from malicious actors? In this work, we focus on
the image input to a vision-language model (VLM). We discover image hijacks,
adversarial images that control generative models at runtime. We introduce
Behavior Matching, a general method for creating image hijacks, and we use it
to explore three types of attacks. Specific string attacks generate arbitrary
output of the adversary's choosing. Leak context attacks leak information from
the context window into the output. Jailbreak attacks circumvent a …
adversarial control discover focus foundation foundation models general generative generative models hijacking image images input language malicious malicious actors runtime work