Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | allinfosecnews.com

June 14, 2024, 4:19 a.m. | Zhao Xu, Fan Liu, Hao Liu

cs.CR updates on arXiv.org arxiv.org

arXiv:2406.09324v1 Announce Type: new
Abstract: Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, …

arxiv attacks benchmarking cs.ai cs.cl cs.cr jailbreak llms tricks

More from arxiv.org / cs.CR updates on arXiv.org

TUBERAIDER: Attributing Coordinated Hate Attacks on YouTube Videos to their Source Communities 22 hours ago | arxiv.org

4chan arxiv attacks communities +9

DeepReShape: Redesigning Neural Networks for Efficient Private Inference 22 hours ago | arxiv.org

a network arxiv can cs.cr +8

Keystroke Dynamics: Concepts, Techniques, and Applications 22 hours ago | arxiv.org

applications arxiv authentication biometric +14

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection 22 hours ago | arxiv.org

arxiv audio audio deepfake class +14

Automated Privacy-Preserving Techniques via Meta-Learning 22 hours ago | arxiv.org

applications arxiv automated cs.cr +13

On Computing Pairwise Statistics with Local Differential Privacy 22 hours ago | arxiv.org

area arxiv computing cs.cr +10

Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language … 22 hours ago | arxiv.org

arxiv automated autonomous blockchain +22

Preference Tuning For Toxicity Mitigation Generalizes Across Languages 22 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.cr +13

Towards unlocking the mystery of adversarial fragility of neural networks 22 hours ago | arxiv.org

adversarial algorithm arxiv can +13

Information Technology Specialist I: Windows Engineer

@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, California

View on infosec-jobs.com

Information Technology Specialist I, LACERA: Information Security Engineer

@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, CA

View on infosec-jobs.com

Solutions Expert

@ General Dynamics Information Technology | USA MD Home Office (MDHOME)

View on infosec-jobs.com

Physical Security Specialist

@ The Aerospace Corporation | Chantilly

View on infosec-jobs.com

System Administrator

@ General Dynamics Information Technology | USA VA Newington - Customer Proprietary (VAC395)

View on infosec-jobs.com

Microsoft Exchange & 365 Systems Engineer - TS/SCI with Polygraph

@ General Dynamics Information Technology | USA VA Chantilly - 14700 Lee Rd (VAS100)

View on infosec-jobs.com