Feb. 7, 2024, 5:10 a.m. | Xuan Li Zhanke Zhou Jianing Zhu Jiangchao Yao Tongliang Liu Bo Han

cs.CR updates on arXiv.org arxiv.org

Despite remarkable success in various applications, large language models (LLMs) are vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous studies for jailbreaks usually resort to brute-force optimization or extrapolations of a high computation cost, which might not be practical or effective. In this paper, inspired by the Milgram experiment w.r.t. the authority power for inciting harmfulness, we disclose a lightweight method, termed DeepInception, which can easily hypnotize LLM to be a jailbreaker. Specifically, DeepInception leverages the …

cs.cr cs.lg

Deputy Chief Information Security Officer

@ United States Holocaust Memorial Museum | Washington, DC

Humbly Confident Security Lead

@ YNAB | Remote

Information Technology Specialist II: Information Security Engineer

@ WBCP, Inc. | Pasadena, CA.

Director of the Air Force Cyber Technical Center of Excellence (CyTCoE)

@ Air Force Institute of Technology | Dayton, OH, USA

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

IT-Security Analyst "Managed Cloud" Fokus MS-Sentinel (m/w/d)*

@ GISA GmbH | Halle, DE