Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection | allinfosecnews.com

April 4, 2024, 4:11 a.m. | Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin

cs.CR updates on arXiv.org arxiv.org

arXiv:2307.16888v3 Announce Type: replace-cross
Abstract: Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on human instructions. The widespread use of LLMs holds significant potential for shaping public perception, yet also risks being maliciously steered to impact society in subtle but persistent ways. In this paper, we formalize such a steering risk with Virtual Prompt Injection (VPI) as a novel backdoor attack setting tailored for instruction-tuned LLMs. In …

arxiv backdooring cs.cl cs.cr cs.lg injection language language models large prompt prompt injection virtual

More from arxiv.org / cs.CR updates on arXiv.org

Quantum $X$-Secure $B$-Byzantine $T$-Colluding Private Information Retrieval 21 hours ago | arxiv.org

arxiv capabilities context cs.cr +13

Approximation of Pufferfish Privacy for Gaussian Priors 21 hours ago | arxiv.org

adversary arxiv cs.cr cs.it +9

Polynomial XL: A Variant of the XL Algorithm Using Macaulay Matrices over Polynomial Rings 21 hours ago | arxiv.org

algorithm arxiv computer computer science +11

Families of sequences with good family complexity and cross-correlation measure 21 hours ago | arxiv.org

alphabet arxiv binary complexity +12

Terrapin Attack: Breaking SSH Channel Integrity By Sequence Number Manipulation 21 hours ago | arxiv.org

access arxiv attack breaking +26

Seeing Is Not Always Believing: Invisible Collision Attack and Defence on Pre-Trained Models 21 hours ago | arxiv.org

arxiv attack bert big +14

Proceedings of the 2nd International Workshop on Adaptive Cyber Defense 21 hours ago | arxiv.org

applications artificial artificial intelligence arxiv +16

Sharpness-Aware Data Poisoning Attack 21 hours ago | arxiv.org

aim arxiv attack attacks +18

Fant\^omas: Understanding Face Anonymization Reversibility 21 hours ago | arxiv.org

anonymization arxiv can claims +14

Security Operations Engineer

@ Nokia | India

View on infosec-jobs.com

Machine Learning DevSecOps Engineer

@ Ford Motor Company | Mexico City, MEX, Mexico

View on infosec-jobs.com

Cybersecurity Defense Analyst 2

@ IDEMIA | Casablanca, MA, 20270

View on infosec-jobs.com

Executive, IT Security

@ CIMB | Cambodia

View on infosec-jobs.com

Cloud Security Architect - Microsoft (m/w/d)

@ Bertelsmann | Gütersloh, NW, DE, 33333

View on infosec-jobs.com

Senior Consultant, Cybersecurity - SOC

@ NielsenIQ | Chennai, India

View on infosec-jobs.com