April 9, 2024, 4:11 a.m. | Zhilong Wang, Yebo Cao, Peng Liu

cs.CR updates on arXiv.org arxiv.org

arXiv:2404.04849v1 Announce Type: new
Abstract: Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. Existing jailbreak attacks can successfully deceive the LLMs, however they cannot deceive the human. This paper proposes a new type of jailbreak attacks which can deceive both the LLMs and human (i.e., security analyst). The key insight of our idea is borrowed from the social psychology - that is human are easily deceived if the lie …

arxiv attacks can cs.ai cs.cr exploiting goal hidden human injection jailbreak language language models large llms logic malicious prompts

Cyber Security Engineer

@ ASSYSTEM | Bridgwater, United Kingdom

Security Analyst

@ Northwestern Memorial Healthcare | Chicago, IL, United States

GRC Analyst

@ Richemont | Shelton, CT, US

Security Specialist

@ Peraton | Government Site, MD, United States

Information Assurance Security Specialist (IASS)

@ OBXtek Inc. | United States

Cyber Security Technology Analyst

@ Airbus | Bengaluru (Airbus)