Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild. (arXiv:2311.06237v1 [cs.CL]) | allinfosecnews.com

Nov. 13, 2023, 2:10 a.m. | Nanna Inie, Jonathan Stray, Leon Derczynski

cs.CR updates on arXiv.org arxiv.org

Engaging in the deliberate generation of abnormal outputs from large language
models (LLMs) by attacking them is a novel human activity. This paper presents
a thorough exposition of how and why people perform such attacks. Using a
formal qualitative methodology, we interviewed dozens of practitioners from a
broad range of backgrounds, all contributors to this novel work of attempting
to cause LLMs to fail. We relate and connect this activity between its
practitioners' motivations and goals; the strategies and techniques …

attacks bind demon human language language models large llm llms methodology novel people qualitative red teaming theory

More from arxiv.org / cs.CR updates on arXiv.org

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots 26 minutes ago | arxiv.org

adoption adversarial adversarial attacks artificial +22

Sui Lutris: A Blockchain Combining Broadcast and Consensus 26 minutes ago | arxiv.org

agreement arxiv blockchain broadcast +12

Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback 26 minutes ago | arxiv.org

arxiv asynchronous blockchains clear +19

Noisy Measurements Are Important, the Design of Census Products Is Much More Important 26 minutes ago | arxiv.org

arxiv asking august call +19

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 26 minutes ago | arxiv.org

analysis arxiv assessment attack +31

REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption 27 minutes ago | arxiv.org

accelerator accelerators address application +14

Evaluation Methodologies in Software Protection Research 27 minutes ago | arxiv.org

arms arxiv assets attackers +20

SoK: Rowhammer on Commodity Operating Systems 27 minutes ago | arxiv.org

academia access arxiv attacks +17

Quantum cryptographic protocols with dual messaging system via 2D alternate quantum walks and genuine single … 27 minutes ago | arxiv.org

alternate arxiv can cond-mat.dis-nn +17

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Senior Software Engineer, Security

@ Niantic | Zürich, Switzerland

View on infosec-jobs.com

Consultant expert en sécurité des systèmes industriels (H/F)

@ Devoteam | Levallois-Perret, France

View on infosec-jobs.com

Cybersecurity Analyst

@ Bally's | Providence, Rhode Island, United States

View on infosec-jobs.com

Digital Trust Cyber Defense Executive

@ KPMG India | Gurgaon, Haryana, India

View on infosec-jobs.com

Program Manager - Cybersecurity Assessment Services

@ TestPros | Remote (and DMV), DC

View on infosec-jobs.com