all InfoSec news
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
May 7, 2024, 4:11 a.m. | Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang
cs.CR updates on arXiv.org arxiv.org
Abstract: To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. …
address analytical arxiv attack box can cs.ai cs.cr detect exploiting flaw framework intent jailbreak jailbreaking llms malicious methodology prompts
More from arxiv.org / cs.CR updates on arXiv.org
A Privacy Preserving System for Movie Recommendations Using Federated Learning
2 days, 16 hours ago |
arxiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Computer and Forensics Investigator
@ ManTech | 221BQ - Cstmr Site,Springfield,VA
Senior Security Analyst
@ Oracle | United States
Associate Vulnerability Management Specialist
@ Diebold Nixdorf | Hyderabad, Telangana, India