Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks | allinfosecnews.com

April 3, 2024, 4:11 a.m. | Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

cs.CR updates on arXiv.org arxiv.org

arXiv:2404.02151v1 Announce Type: new
Abstract: We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks. First, we demonstrate how to successfully leverage access to logprobs for jailbreaking: we initially design an adversarial prompt template (sometimes adapted to the target LLM), and then we apply random search on a suffix to maximize the target logprob (e.g., of the token "Sure"), potentially with multiple restarts. In this way, we achieve nearly 100\% attack success rate …

arxiv attacks cs.ai cs.cr cs.lg jailbreaking llms safety simple stat.ml

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 1 day, 3 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 1 day, 3 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 1 day, 3 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 1 day, 3 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 1 day, 3 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 1 day, 3 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 1 day, 3 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 1 day, 3 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 1 day, 3 hours ago | arxiv.org

analysis arxiv can capabilities +17

Principal Security Engineer

@ Elsevier | Home based-Georgia

View on infosec-jobs.com

Infrastructure Compliance Engineer

@ NVIDIA | US, CA, Santa Clara

View on infosec-jobs.com

Information Systems Security Engineer (ISSE) / Cybersecurity SME

@ Green Cell Consulting | Twentynine Palms, CA, United States

View on infosec-jobs.com

Sales Security Analyst

@ Everbridge | Bengaluru

View on infosec-jobs.com

Alternance – Analyste Threat Intelligence – Cybersécurité - Île-de-France

@ Sopra Steria | Courbevoie, France

View on infosec-jobs.com

Third Party Cyber Risk Analyst

@ Chubb | Philippines

View on infosec-jobs.com