Tree of Attacks: Jailbreaking Black-Box LLMs Automatically | allinfosecnews.com

Feb. 22, 2024, 5:11 a.m. | Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, Amin Karbasi

cs.CR updates on arXiv.org arxiv.org

arXiv:2312.02119v2 Announce Type: replace-cross
Abstract: While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thought reasoning until one of the generated prompts jailbreaks the target. Crucially, before …

access arxiv attacks automated box continue cs.ai cs.cl cs.cr cs.lg display human jailbreaking language language models large llms stat.ml tap toxic work

More from arxiv.org / cs.CR updates on arXiv.org

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots 6 hours ago | arxiv.org

adoption adversarial adversarial attacks artificial +22

Sui Lutris: A Blockchain Combining Broadcast and Consensus 6 hours ago | arxiv.org

agreement arxiv blockchain broadcast +12

Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback 6 hours ago | arxiv.org

arxiv asynchronous blockchains clear +19

Noisy Measurements Are Important, the Design of Census Products Is Much More Important 6 hours ago | arxiv.org

arxiv asking august call +19

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 6 hours ago | arxiv.org

analysis arxiv assessment attack +31

REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption 6 hours ago | arxiv.org

accelerator accelerators address application +14

Evaluation Methodologies in Software Protection Research 6 hours ago | arxiv.org

arms arxiv assets attackers +20

SoK: Rowhammer on Commodity Operating Systems 6 hours ago | arxiv.org

academia access arxiv attacks +17

Quantum cryptographic protocols with dual messaging system via 2D alternate quantum walks and genuine single … 6 hours ago | arxiv.org

alternate arxiv can cond-mat.dis-nn +17

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Premium Hub - CoE: Business Process Senior Consultant, SAP Security Role and Authorisations & GRC

@ SAP | Dublin 24, IE, D24WA02

View on infosec-jobs.com

Product Security Response Engineer

@ Intel | CRI - Belen, Heredia

View on infosec-jobs.com

Application Security Architect

@ Uni Systems | Brussels, Brussels, Belgium

View on infosec-jobs.com

Sr Product Security Engineer

@ ServiceNow | Hyderabad, India

View on infosec-jobs.com

Analyst, Cybersecurity & Technology (Initial Application Deadline May 20th, Final Deadline May 31st)

@ FiscalNote | United Kingdom (UK)

View on infosec-jobs.com