all InfoSec news
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
April 3, 2024, 4:10 a.m. | Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George
cs.CR updates on arXiv.org arxiv.org
Abstract: Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely …
address arxiv attacks benchmark benchmarks challenges clear collection cs.cr cs.lg current evaluation jailbreak jailbreaking language language models large llms practice robustness standard techniques
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Associate Engineer (Security Operations Centre)
@ People Profilers | Singapore, Singapore, Singapore
DevSecOps Engineer
@ Australian Payments Plus | Sydney, New South Wales, Australia
Senior Cybersecurity Specialist
@ SmartRecruiters Inc | Poland, Poland