Sept. 4, 2023, 1:10 a.m. | Varshini Subhash, Anna Bialas, Weiwei Pan, Finale Doshi-Velez

cs.CR updates on arXiv.org arxiv.org

Transformer based large language models with emergent capabilities are
becoming increasingly ubiquitous in society. However, the task of understanding
and interpreting their internal workings, in the context of adversarial
attacks, remains largely unsolved. Gradient-based universal adversarial attacks
have been shown to be highly effective on large language models and potentially
dangerous due to their input-agnostic nature. This work presents a novel
geometric perspective explaining universal adversarial attacks on large
language models. By attacking the 117M parameter GPT-2 model, we find …

adversarial adversarial attacks attacks capabilities context geometry internal language language models large society task understanding work

Head of Security Operations

@ Canonical Ltd. | Home based - Americas, EMEA

Security Specialist

@ Lely | Maassluis, Netherlands

Senior Cyber Incident Response (Hybrid)

@ SmartDev | Cầu Giấy, Vietnam

Sr Security Engineer - Colombia

@ Nubank | Colombia, Bogota

Security Engineer, Investigations - i3

@ Meta | Menlo Park, CA | Washington, DC | Remote, US

Cyber Security Engineer

@ ASSYSTEM | Bridgwater, United Kingdom