Oct. 9, 2023, 1:10 a.m. | Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin

cs.CR updates on arXiv.org arxiv.org

Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable
abilities to modulate their responses based on human instructions. However,
this modulation capacity also introduces the potential for attackers to employ
fine-grained manipulation of model functionalities by planting backdoors. In
this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor
attack setting tailored for instruction-tuned LLMs. In a VPI attack, the
backdoored model is expected to respond as if an attacker-specified virtual
prompt were concatenated to the user instruction under …

attackers backdooring backdoors human injection language language models large llms manipulation modulate prompt injection virtual

Digital Security Infrastructure Manager

@ Wizz Air | Budapest, HU, H-1103

Sr. Solution Consultant

@ Highspot | Sydney

Cyber Security Analyst III

@ Love's Travel Stops | Oklahoma City, OK, US, 73120

Lead Security Engineer

@ JPMorgan Chase & Co. | Tampa, FL, United States

GTI Manager of Cybersecurity Operations

@ Grant Thornton | Tulsa, OK, United States

GCP Incident Response Engineer

@ Publicis Groupe | Dallas, Texas, United States