all InfoSec news
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection. (arXiv:2307.16888v2 [cs.CL] UPDATED)
cs.CR updates on arXiv.org arxiv.org
Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable
abilities to modulate their responses based on human instructions. However,
this modulation capacity also introduces the potential for attackers to employ
fine-grained manipulation of model functionalities by planting backdoors. In
this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor
attack setting tailored for instruction-tuned LLMs. In a VPI attack, the
backdoored model is expected to respond as if an attacker-specified virtual
prompt were concatenated to the user instruction under …
attackers backdooring backdoors human injection language language models large llms manipulation modulate prompt injection virtual