Query-Based Adversarial Prompt Generation | allinfosecnews.com

Feb. 20, 2024, 5:11 a.m. | Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tram\`er, Milad Nasr

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.12329v1 Announce Type: cross
Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language …

access adversarial arxiv attacks box cs.ai cs.cl cs.cr cs.lg examples language prompt query strings work

More from arxiv.org / cs.CR updates on arXiv.org

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models 2 days, 3 hours ago | arxiv.org

accessibility art arxiv attention +15

Provably Robust Cost-Sensitive Learning via Randomized Smoothing 2 days, 3 hours ago | arxiv.org

arxiv cost cs.cr cs.lg +1

WW-FL: Secure and Private Large-Scale Federated Learning 2 days, 3 hours ago | arxiv.org

arxiv attacks client cs.cr +28

Formalizing and Benchmarking Prompt Injection Attacks and Defenses 2 days, 3 hours ago | arxiv.org

arxiv attacks benchmarking cs.ai +9

SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? 2 days, 3 hours ago | arxiv.org

adoption analysis applications arxiv +30

An Efficient and Multi-private Key Secure Aggregation for Federated Learning 2 days, 3 hours ago | arxiv.org

aggregation arxiv client cs.ai +21

How (not) to Build Quantum PKE in Minicrypt 2 days, 3 hours ago | arxiv.org

arxiv box build can +12

Computing Low-Entropy Couplings for Large-Support Distributions 2 days, 3 hours ago | arxiv.org

arxiv computing cs.cr cs.it +6

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness 2 days, 3 hours ago | arxiv.org

arxiv attacks backdoor backdoor attacks +14

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

View on infosec-jobs.com

Cyber Security Strategy Consultant

@ Capco | New York City

View on infosec-jobs.com

Cyber Security Senior Consultant

@ Capco | Chicago, IL

View on infosec-jobs.com

Sr. Product Manager

@ MixMode | Remote, US

View on infosec-jobs.com

Corporate Intern - Information Security (Year Round)

@ Associated Bank | US WI Remote

View on infosec-jobs.com

Senior Offensive Security Engineer

@ CoStar Group | US-DC Washington, DC

View on infosec-jobs.com