JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models | allinfosecnews.com

April 16, 2024, 4:10 a.m. | Yingchaojie Feng, Zhizhang Chen, Zhining Kang, Sijia Wang, Minfeng Zhu, Wei Zhang, Wei Chen

cs.CR updates on arXiv.org arxiv.org

arXiv:2404.08793v1 Announce Type: new
Abstract: The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential weaknesses. However, the complexity of evaluating jailbreak performance and understanding prompt characteristics makes this analysis laborious. We collaborate with domain experts to characterize problems and propose an …

adversaries analysis arxiv attacks capabilities cs.cl cs.cr cs.hc defensive design jailbreak language language models large llms proliferation prompts safety security vulnerabilities

More from arxiv.org / cs.CR updates on arXiv.org

Causal Inference with Differentially Private (Clustered) Outcomes 21 hours ago | arxiv.org

algorithm arxiv cs.cr cs.lg +12

An artificial neural network approach to finding the key length of the Vigen\`{e}re cipher 21 hours ago | arxiv.org

accuracy article artificial arxiv +9

Generic Selfish Mining MDP for DAG Protocols 21 hours ago | arxiv.org

analysis arxiv bitcoin breaking +15

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response 21 hours ago | arxiv.org

algorithms arxiv cs.cr data +14

Succinct arguments for QMA from standard assumptions via compiled nonlocal games 21 hours ago | arxiv.org

argument arxiv building crypto +8

On Training a Neural Network to Explain Binaries 21 hours ago | arxiv.org

aid arxiv binary code +15

Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning 21 hours ago | arxiv.org

arxiv attacks backdoor backdoor attacks +14

Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning 21 hours ago | arxiv.org

arxiv attack attacks cs.cr +16

An Extensive Survey of Digital Image Steganography: State of the Art 21 hours ago | arxiv.org

adoption art arxiv attention +21

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Security Engineer II- Full stack Java with React

@ JPMorgan Chase & Co. | Hyderabad, Telangana, India

View on infosec-jobs.com

Cybersecurity SecOps

@ GFT Technologies | Mexico City, MX, 11850

View on infosec-jobs.com

Senior Information Security Advisor

@ Sun Life | Sun Life Toronto One York

View on infosec-jobs.com

Contract Special Security Officer (CSSO) - Top Secret Clearance

@ SpaceX | Hawthorne, CA

View on infosec-jobs.com

Early Career Cyber Security Operations Center (SOC) Analyst

@ State Street | Quincy, Massachusetts

View on infosec-jobs.com