all InfoSec news
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
June 14, 2024, 4:19 a.m. | Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang
cs.CR updates on arXiv.org arxiv.org
Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LLM's response are varied, such as manual annotation or prompting GPT-4 in specific ways. Each …
aim arxiv attacks cs.ai cs.cl cs.cr defenses emerging forbidden instructions jailbreak language language models large llms research threats toolkit
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Technology Specialist I: Windows Engineer
@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, California
Information Technology Specialist I, LACERA: Information Security Engineer
@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, CA
Solutions Expert
@ General Dynamics Information Technology | USA MD Home Office (MDHOME)
Physical Security Specialist
@ The Aerospace Corporation | Chantilly
System Administrator
@ General Dynamics Information Technology | USA VA Newington - Customer Proprietary (VAC395)
Microsoft Exchange & 365 Systems Engineer - TS/SCI with Polygraph
@ General Dynamics Information Technology | USA VA Chantilly - 14700 Lee Rd (VAS100)