Teaching LLMs to Be Deceptive | allinfosecnews.com

Feb. 7, 2024, 12:04 p.m. | Bruce Schneier

Schneier on Security www.schneier.com

Interesting research: “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training“:

Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). …

academic papers deception detect humans llm llms objectives opportunity order remove research safety strategy system teaching training

More from www.schneier.com / Schneier on Security

Friday Squid Blogging: Squid Catch Quotas in Peru 2 days, 5 hours ago | www.schneier.com

article blog blogging can +11

Security and Human Behavior (SHB) 2024 2 days, 6 hours ago | www.schneier.com

behavior co-founder conferences cybersecurity +15

The Justice Department Took Down the 911 S5 Botnet 2 days, 15 hours ago | www.schneier.com

911 s5 911 s5 botnet amass a network +23

Espionage with a Drone 3 days, 11 hours ago | www.schneier.com

aircraft bans charge doing +12

Online Privacy and Overfishing 4 days, 15 hours ago | www.schneier.com

ai tools artificial intelligence attacks caught +15

Breaking a Password Manager 5 days, 15 hours ago | www.schneier.com

bitcoin breaking cryptocurrency cryptocurrency wallet +21

Seeing Like a Data Structure 6 days, 15 hours ago | www.schneier.com

amplify build control data +14

AI Will Increase the Quantity—and Quality—of Phishing Scams 6 days, 15 hours ago | www.schneier.com

advanced ai tools artificial artificial intelligence +19

Friday Squid Blogging: Baby Colossal Squid 1 week, 2 days ago | www.schneier.com

baby blog blogging can +7

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

View on infosec-jobs.com

Cyber Security Strategy Consultant

@ Capco | New York City

View on infosec-jobs.com

Cyber Security Senior Consultant

@ Capco | Chicago, IL

View on infosec-jobs.com

Senior Security Researcher - Linux MacOS EDR (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel

View on infosec-jobs.com

Sr. Manager, NetSec GTM Programs

@ Palo Alto Networks | Santa Clara, CA, United States

View on infosec-jobs.com

SOC Analyst I

@ Fortress Security Risk Management | Cleveland, OH, United States

View on infosec-jobs.com