Jan. 1, 2024, 2:10 a.m. | Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

cs.CR updates on arXiv.org arxiv.org

Large Language Models (LLMs) are attracting significant research attention
due to their instruction-following abilities, allowing users and developers to
leverage LLMs for a variety of tasks. However, LLMs are vulnerable to
prompt-injection attacks: a class of attacks that hijack the model's
instruction-following abilities, changing responses to prompts to undesired,
possibly malicious ones. In this work, we introduce Jatmo, a method for
generating task-specific models resilient to prompt-injection attacks. Jatmo
leverages the fact that LLMs can only follow instructions once they …

attacks attention changing class defense developers finetuning hijack injection injection attacks language language models large llms malicious prompt prompt injection prompts research task vulnerable

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

Cyber Security Strategy Consultant

@ Capco | New York City

Cyber Security Senior Consultant

@ Capco | Chicago, IL

Senior Security Researcher - Linux MacOS EDR (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel

Sr. Manager, NetSec GTM Programs

@ Palo Alto Networks | Santa Clara, CA, United States

SOC Analyst I

@ Fortress Security Risk Management | Cleveland, OH, United States