all InfoSec news
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
June 12, 2024, 4:11 a.m. | Fan Liu, Zhao Xu, Hao Liu
cs.CR updates on arXiv.org arxiv.org
Abstract: Although safely enhanced Large Language Models (LLMs) have achieved remarkable success in tackling various complex tasks in a zero-shot manner, they remain susceptible to jailbreak attacks, particularly the unknown jailbreak attack. To enhance LLMs' generalized defense capabilities, we propose a two-stage adversarial tuning framework, which generates adversarial prompts to explore worst-case scenarios by optimizing datasets containing pairs of adversarial prompts and their safe responses. In the first stage, we introduce the hierarchical meta-universal adversarial prompt …
adversarial arxiv attack attacks capabilities cs.ai cs.cl cs.cr defending defense framework jailbreak language language models large llms stage the unknown
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Consultant Sénior Cyber Sécurité H/F
@ Hifield | Lyon, France
Information Security & Resilience Analyst APAC
@ abrdn | Singapore
Technical Product Engineer
@ Palo Alto Networks | Tel Aviv-Yafo, Israel
Azure Cloud Architect
@ Version 1 | Dublin, Ireland
Junior Pen Tester
@ Vertiv | Pune, India
Information Security GRC Director
@ IQ-EQ | Hyderabad, India