Oct. 9, 2023, 1:10 a.m. | Eugene Bagdasaryan, Rishi Jha, Tingwei Zhang, Vitaly Shmatikov

cs.CR updates on arXiv.org arxiv.org

Multi-modal embeddings encode images, sounds, texts, videos, etc. into a
single embedding space, aligning representations across modalities (e.g.,
associate an image of a dog with a barking sound). We show that multi-modal
embeddings can be vulnerable to an attack we call "adversarial illusions."
Given an image or a sound, an adversary can perturb it so as to make its
embedding close to an arbitrary, adversary-chosen input in another modality.
This enables the adversary to align any image and any sound …

adversarial adversary attack call dog etc image images modal single sound space texts videos vulnerable

Security Analyst

@ Northwestern Memorial Healthcare | Chicago, IL, United States

GRC Analyst

@ Richemont | Shelton, CT, US

Security Specialist

@ Peraton | Government Site, MD, United States

Information Assurance Security Specialist (IASS)

@ OBXtek Inc. | United States

Cyber Security Technology Analyst

@ Airbus | Bengaluru (Airbus)

Vice President, Cyber Operations Engineer

@ BlackRock | LO9-London - Drapers Gardens