Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models | allinfosecnews.com

April 4, 2024, 4:11 a.m. | Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, Muhao Chen

cs.CR updates on arXiv.org arxiv.org

arXiv:2305.14710v2 Announce Type: replace-cross
Abstract: We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance. Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions (~1000 tokens) and control model behavior through data poisoning, without even the need to modify data instances or labels themselves. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly …

arxiv attacker backdoor backdoors can control crowdsourced cs.ai cs.cl cs.cr cs.lg datasets inject instructions language language models large malicious paradigm performance security security concerns studies task tokens vulnerabilities

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 13 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 13 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 13 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 13 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 13 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 13 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 13 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 13 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 13 hours ago | arxiv.org

analysis arxiv can capabilities +17

Intern, Cyber Security Vulnerability Management

@ Grab | Petaling Jaya, Malaysia

View on infosec-jobs.com

Compliance - Global Privacy Office - Associate - Bengaluru

@ Goldman Sachs | Bengaluru, Karnataka, India

View on infosec-jobs.com

Cyber Security Engineer (m/w/d) Operational Technology

@ MAN Energy Solutions | Oberhausen, DE, 46145

View on infosec-jobs.com

Armed Security Officer - Hospital

@ Allied Universal | Sun Valley, CA, United States

View on infosec-jobs.com

Governance, Risk and Compliance Officer (Africa)

@ dLocal | Lagos (Remote)

View on infosec-jobs.com

Junior Cloud DevSecOps Network Engineer

@ Accenture Federal Services | Arlington, VA

View on infosec-jobs.com