May 7, 2024, 4:12 a.m. | Zeming Wei, Yifei Wang, Yisen Wang

cs.CR updates on arXiv.org arxiv.org

arXiv:2310.06387v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating harmful content have emerged. In this paper, we delve into the potential of In-Context Learning (ICL) to modulate the alignment of LLMs. Specifically, we propose the In-Context Attack (ICA), which employs strategically crafted harmful demonstrations to subvert LLMs, and the In-Context Defense (ICD), which bolsters model resilience through examples that demonstrate refusal to produce …

alignment arxiv context cs.ai cs.cl cs.cr cs.lg guard jailbreak language language models large llms modulate safety

Information Security Engineers

@ D. E. Shaw Research | New York City

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Senior - Penetration Tester

@ Deloitte | Madrid, España

Associate Cyber Incident Responder

@ Highmark Health | PA, Working at Home - Pennsylvania

Senior Insider Threat Analyst

@ IT Concepts Inc. | Woodlawn, Maryland, United States