Feb. 20, 2024, 5:11 a.m. | Tingwei Zhang, Rishi Jha, Eugene Bagdasaryan, Vitaly Shmatikov

cs.CR updates on arXiv.org arxiv.org

arXiv:2308.11804v3 Announce Type: replace
Abstract: Multi-modal embeddings encode texts, images, sounds, videos, etc., into a single embedding space, aligning representations across different modalities (e.g., associate an image of a dog with a barking sound). In this paper, we show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an image or a sound, an adversary can perturb it to make its embedding close to an arbitrary, adversary-chosen input in another modality.
These attacks are cross-modal …

adversarial arxiv attack call can cs.ai cs.cr cs.lg dog etc image images modal single sound space texts videos vulnerable

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Salesforce Solution Consultant

@ BeyondTrust | Remote United States

Divisional Deputy City Solicitor, Public Safety Compliance Counsel - Compliance and Legislation Unit

@ City of Philadelphia | Philadelphia, PA, United States

Security Engineer, IT IAM, EIS

@ Micron Technology | Hyderabad - Skyview, India

Security Analyst

@ Northwestern Memorial Healthcare | Chicago, IL, United States

Werkstudent Cybersecurity (m/w/d)

@ Brose Group | Bamberg, DE, 96052