all InfoSec news
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
April 4, 2024, 4:11 a.m. | Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, Muhao Chen
cs.CR updates on arXiv.org arxiv.org
Abstract: We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance. Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions (~1000 tokens) and control model behavior through data poisoning, without even the need to modify data instances or labels themselves. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly …
arxiv attacker backdoor backdoors can control crowdsourced cs.ai cs.cl cs.cr cs.lg datasets inject instructions language language models large malicious paradigm performance security security concerns studies task tokens vulnerabilities
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Sr. Staff Firmware Engineer – Networking & Firewall
@ Axiado | Bengaluru, India
Compliance Architect / Product Security Sr. Engineer/Expert (f/m/d)
@ SAP | Walldorf, DE, 69190
SAP Security Administrator
@ FARO Technologies | EMEA-Portugal