Why do universal adversarial attacks work on large language models?: Geometry might be the answer. (arXiv:2309.00254v1 [cs.LG]) | allinfosecnews.com

Sept. 4, 2023, 1:10 a.m. | Varshini Subhash, Anna Bialas, Weiwei Pan, Finale Doshi-Velez

cs.CR updates on arXiv.org arxiv.org

Transformer based large language models with emergent capabilities are
becoming increasingly ubiquitous in society. However, the task of understanding
and interpreting their internal workings, in the context of adversarial
attacks, remains largely unsolved. Gradient-based universal adversarial attacks
have been shown to be highly effective on large language models and potentially
dangerous due to their input-agnostic nature. This work presents a novel
geometric perspective explaining universal adversarial attacks on large
language models. By attacking the 117M parameter GPT-2 model, we find …

adversarial adversarial attacks attacks capabilities context geometry internal language language models large society task understanding work

More from arxiv.org / cs.CR updates on arXiv.org

David and Goliath: An Empirical Evaluation of Attacks and Defenses for QNNs at the Deep … 7 hours ago | arxiv.org

applications arm arxiv attacks +18

How to Use Quantum Indistinguishability Obfuscation 7 hours ago | arxiv.org

arxiv class copy cs.cr +6

Privately Aligning Language Models with Reinforcement Learning 7 hours ago | arxiv.org

alignment arxiv chatgpt cs.cr +13

Numeric Truncation Security Predicate 7 hours ago | arxiv.org

arxiv bits conversion cs.cr +11

Causal Discovery Under Local Privacy 7 hours ago | arxiv.org

application arxiv consumers cs.ai +19

Investigating Threats Posed by SMS Origin Spoofing to IoT Devices 7 hours ago | arxiv.org

arxiv communication cs.cr devices +18

Impact of Architectural Modifications on Deep Learning Adversarial Robustness 7 hours ago | arxiv.org

adoption advancements adversarial applications +23

Tokenization of Real Estate Assets Using Blockchain 7 hours ago | arxiv.org

area arxiv assets banking +20

A Survey on Privacy-Preserving Caching at Network Edge: Classification, Solutions, and Challenges 7 hours ago | arxiv.org

arxiv caching challenges classification +12

Head of Security Operations

@ Canonical Ltd. | Home based - Americas, EMEA

View on infosec-jobs.com

Security Specialist

@ Lely | Maassluis, Netherlands

View on infosec-jobs.com

Senior Cyber Incident Response (Hybrid)

@ SmartDev | Cầu Giấy, Vietnam

View on infosec-jobs.com

Sr Security Engineer - Colombia

@ Nubank | Colombia, Bogota

View on infosec-jobs.com

Security Engineer, Investigations - i3

@ Meta | Menlo Park, CA | Washington, DC | Remote, US

View on infosec-jobs.com

Cyber Security Engineer

@ ASSYSTEM | Bridgwater, United Kingdom

View on infosec-jobs.com