Oct. 9, 2023, 1:10 a.m. | Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin

cs.CR updates on arXiv.org arxiv.org

Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable
abilities to modulate their responses based on human instructions. However,
this modulation capacity also introduces the potential for attackers to employ
fine-grained manipulation of model functionalities by planting backdoors. In
this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor
attack setting tailored for instruction-tuned LLMs. In a VPI attack, the
backdoored model is expected to respond as if an attacker-specified virtual
prompt were concatenated to the user instruction under …

attackers backdooring backdoors human injection language language models large llms manipulation modulate prompt injection virtual

Information Security Engineers

@ D. E. Shaw Research | New York City

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Cloud Security Engineer

@ City National Bank of Florida | Miami, FL, United States

Principal Security Engineer

@ VIANT | New York City

Associate Detection & Response Analyst

@ Rapid7 | VA Arlington 22203