all InfoSec news
Query-Based Adversarial Prompt Generation
Feb. 20, 2024, 5:11 a.m. | Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tram\`er, Milad Nasr
cs.CR updates on arXiv.org arxiv.org
Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language …
access adversarial arxiv attacks box cs.ai cs.cl cs.cr cs.lg examples language prompt query strings work
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Technical Senior Manager, SecOps | Remote US
@ Coalfire | United States
Global Cybersecurity Governance Analyst
@ UL Solutions | United States
Security Engineer II, AWS Offensive Security
@ Amazon.com | US, WA, Virtual Location - Washington
Senior Cyber Threat Intelligence Analyst
@ Sainsbury's | Coventry, West Midlands, United Kingdom
Embedded Global Intelligence and Threat Monitoring Analyst
@ Sibylline Ltd | Austin, Texas, United States
Senior Security Engineer
@ Curai Health | Remote