all InfoSec news
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild. (arXiv:2311.06237v1 [cs.CL])
cs.CR updates on arXiv.org arxiv.org
Engaging in the deliberate generation of abnormal outputs from large language
models (LLMs) by attacking them is a novel human activity. This paper presents
a thorough exposition of how and why people perform such attacks. Using a
formal qualitative methodology, we interviewed dozens of practitioners from a
broad range of backgrounds, all contributors to this novel work of attempting
to cause LLMs to fail. We relate and connect this activity between its
practitioners' motivations and goals; the strategies and techniques …
attacks bind demon human language language models large llm llms methodology novel people qualitative red teaming theory