Feb. 22, 2024, 5:11 a.m. | Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, Amin Karbasi

cs.CR updates on arXiv.org arxiv.org

arXiv:2312.02119v2 Announce Type: replace-cross
Abstract: While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thought reasoning until one of the generated prompts jailbreaks the target. Crucially, before …

access arxiv attacks automated box continue cs.ai cs.cl cs.cr cs.lg display human jailbreaking language language models large llms stat.ml tap toxic work

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

Premium Hub - CoE: Business Process Senior Consultant, SAP Security Role and Authorisations & GRC

@ SAP | Dublin 24, IE, D24WA02

Product Security Response Engineer

@ Intel | CRI - Belen, Heredia

Application Security Architect

@ Uni Systems | Brussels, Brussels, Belgium

Sr Product Security Engineer

@ ServiceNow | Hyderabad, India

Analyst, Cybersecurity & Technology (Initial Application Deadline May 20th, Final Deadline May 31st)

@ FiscalNote | United Kingdom (UK)