May 16, 2024, 4:13 a.m. | Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang

cs.CR updates on arXiv.org arxiv.org

arXiv:2308.03825v2 Announce Type: replace
Abstract: The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and …

adversarial arxiv attack attack vector attention bypass cs.cr cs.lg general jailbreak language language models large llm llms main prompt prompts public safeguards vendors

Sr. Product Manager

@ MixMode | Remote, US

Information Security Engineers

@ D. E. Shaw Research | New York City

Endpoint Security Engineer

@ Sabre Corporation | GBR LNDN 25 Walbrook FL5&6

Consultant - System Management

@ LTIMindtree | Bellevue - Washington - USA, WA, US

Security Compliance Officer - ESO

@ National Grid | Wokingham, GB, RG41 5BN

Information Security Specialist (Governance and Compliance)

@ Co-operators | Ontario, Canada; Saskatchewan, Canada; Alberta, Canada