all InfoSec news
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
May 24, 2024, 4:11 a.m. | Govind Ramesh, Yao Dou, Wei Xu
cs.CR updates on arXiv.org arxiv.org
Abstract: Research on jailbreaking has been valuable for testing and understanding the safety and security issues of large language models (LLMs). In this paper, we introduce Iterative Refinement Induced Self-Jailbreak (IRIS), a novel approach that leverages the reflective capabilities of LLMs for jailbreaking with only black-box access. Unlike previous methods, IRIS simplifies the jailbreaking process by using a single model as both the attacker and target. This method first iteratively refines adversarial prompts through self-explanation, which …
access arxiv box capabilities cs.ai cs.cl cs.cr gpt gpt-4 iris issues jailbreak jailbreaking language language models large llms near novel perfect research safety security security issues testing understanding
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Technology Specialist I: Windows Engineer
@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, California
Information Technology Specialist I, LACERA: Information Security Engineer
@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, CA
Vice President, Controls Design & Development-7
@ State Street | Quincy, Massachusetts
Vice President, Controls Design & Development-5
@ State Street | Quincy, Massachusetts
Data Scientist & AI Prompt Engineer
@ Varonis | Israel
Contractor
@ Birlasoft | INDIA - MUMBAI - BIRLASOFT OFFICE, IN