MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance. (arXiv:2401.02906v1 [cs.CR]) | allinfosecnews.com

Jan. 8, 2024, 2:10 a.m. | Renjie Pi, Tianyang Han, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang

cs.CR updates on arXiv.org arxiv.org

The deployment of multimodal large language models (MLLMs) has brought forth
a unique vulnerability: susceptibility to malicious attacks through visual
inputs. We delve into the novel challenge of defending MLLMs against such
attacks. We discovered that images act as a "foreign language" that is not
considered during alignment, which can make MLLMs prone to producing harmful
responses. Unfortunately, unlike the discrete tokens considered in text-based
LLMs, the continuous nature of image signals presents significant alignment
challenges, which poses difficulty to …

act alignment attacks challenge defending deployment images inputs language language models large malicious novel performance safety vulnerability

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 22 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 22 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 22 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 22 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 22 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 22 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 22 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 22 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 22 hours ago | arxiv.org

analysis arxiv can capabilities +17

Senior Security Engineer - Detection and Response

@ Fastly, Inc. | US (Remote)

View on infosec-jobs.com

Application Security Engineer

@ Solidigm | Zapopan, Mexico

View on infosec-jobs.com

Defensive Cyber Operations Engineer-Mid

@ ISYS Technologies | Aurora, CO, United States

View on infosec-jobs.com

Manager, Information Security GRC

@ OneTrust | Atlanta, Georgia

View on infosec-jobs.com

Senior Information Security Analyst | IAM

@ EBANX | Curitiba or São Paulo

View on infosec-jobs.com

Senior Information Security Engineer, Cloud Vulnerability Research

@ Google | New York City, USA; New York, USA

View on infosec-jobs.com