Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection. (arXiv:2307.16888v2 [cs.CL] UPDATED) | allinfosecnews.com

Oct. 9, 2023, 1:10 a.m. | Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin

cs.CR updates on arXiv.org arxiv.org

Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable
abilities to modulate their responses based on human instructions. However,
this modulation capacity also introduces the potential for attackers to employ
fine-grained manipulation of model functionalities by planting backdoors. In
this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor
attack setting tailored for instruction-tuned LLMs. In a VPI attack, the
backdoored model is expected to respond as if an attacker-specified virtual
prompt were concatenated to the user instruction under …

attackers backdooring backdoors human injection language language models large llms manipulation modulate prompt injection virtual

More from arxiv.org / cs.CR updates on arXiv.org

Dihedral Quantum Codes 23 hours ago | arxiv.org

arxiv block class code +9

A Privacy Preserving System for Movie Recommendations Using Federated Learning 23 hours ago | arxiv.org

arxiv businesses cs.cr cs.ir +20

VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model 23 hours ago | arxiv.org

accelerate advisory arxiv cs.cr +24

Simultaneous Haar Indistinguishability with Applications to Unclonable Cryptography 23 hours ago | arxiv.org

applications arxiv build cloning +11

SoK: Prudent Evaluation Practices for Fuzzing 23 hours ago | arxiv.org

afl arxiv bugs concept +13

The Effect of Quantization in Federated Learning: A R\'enyi Differential Privacy Perspective 23 hours ago | arxiv.org

arxiv can cs.cr cs.dc +15

Unveiling the Potential: Harnessing Deep Metric Learning to Circumvent Video Streaming Encryption 23 hours ago | arxiv.org

arxiv attacks body cs.ai +16

SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data 23 hours ago | arxiv.org

access arxiv back build +15

IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency 23 hours ago | arxiv.org

adversaries arxiv attacks backdoor +22

Information Security Engineers

@ D. E. Shaw Research | New York City

View on infosec-jobs.com

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

View on infosec-jobs.com

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

View on infosec-jobs.com

Cloud Security Engineer

@ City National Bank of Florida | Miami, FL, United States

View on infosec-jobs.com

Principal Security Engineer

@ VIANT | New York City

View on infosec-jobs.com

Associate Detection & Response Analyst

@ Rapid7 | VA Arlington 22203

View on infosec-jobs.com