all InfoSec news
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Feb. 22, 2024, 5:11 a.m. | Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, Amin Karbasi
cs.CR updates on arXiv.org arxiv.org
Abstract: While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thought reasoning until one of the generated prompts jailbreaks the target. Crucially, before …
access arxiv attacks automated box continue cs.ai cs.cl cs.cr cs.lg display human jailbreaking language language models large llms stat.ml tap toxic work
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Social Engineer For Reverse Engineering Exploit Study
@ Independent study | Remote
Premium Hub - CoE: Business Process Senior Consultant, SAP Security Role and Authorisations & GRC
@ SAP | Dublin 24, IE, D24WA02
Product Security Response Engineer
@ Intel | CRI - Belen, Heredia
Application Security Architect
@ Uni Systems | Brussels, Brussels, Belgium
Sr Product Security Engineer
@ ServiceNow | Hyderabad, India
Analyst, Cybersecurity & Technology (Initial Application Deadline May 20th, Final Deadline May 31st)
@ FiscalNote | United Kingdom (UK)