May 3, 2024, 4:15 a.m. | Yihao Zhang, Zeming Wei

cs.CR updates on arXiv.org arxiv.org

arXiv:2405.01229v1 Announce Type: cross
Abstract: Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation …

arxiv attack cs.ai cs.cl cs.cr cs.lg jailbreak math.oc momentum

Information Security Engineers

@ D. E. Shaw Research | New York City

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Information Technology Security Engineer

@ Plexus Worldwide | Scottsdale, Arizona, United States

Principal Email Security Researcher (Cortex XDR)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel

Lead Security Engineer - Cloud Security, AWS

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India