May 7, 2024, 4:11 a.m. | Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

cs.CR updates on arXiv.org arxiv.org

arXiv:2405.03654v1 Announce Type: new
Abstract: To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. …

address analytical arxiv attack box can cs.ai cs.cr detect exploiting flaw framework intent jailbreak jailbreaking llms malicious methodology prompts

Information Security Engineers

@ D. E. Shaw Research | New York City

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Computer and Forensics Investigator

@ ManTech | 221BQ - Cstmr Site,Springfield,VA

Senior Security Analyst

@ Oracle | United States

Associate Vulnerability Management Specialist

@ Diebold Nixdorf | Hyderabad, Telangana, India