New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

Jan. 12, 2024, 10:54 p.m. | Michael Nuñez

Security – VentureBeat venturebeat.com

New study from Anthropic reveals techniques for training deceptive "sleeper agent" AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness.

agent ai ai ethics ai models ai risks ai safety anthropic artificial intelligence arxiv automation business conceal current data infrastructure data management data science machine learning ml and deep learning nlp programming & development research paper safety security security newsletter study techniques training trustworthiness vb daily newsletter

Visit resource

More from venturebeat.com / Security – VentureBeat

Anyscale addresses critical vulnerability on Ray framework — but thousands were still exposed 3 weeks ago | venturebeat.com

access addresses ai anyscale +27

DataStax acquires Langflow to accelerate enterprise generative AI app development 3 weeks, 1 day ago | venturebeat.com

accelerate adoption ai ai app development +30

Google Cloud and CSA: 2024 will bring significant generative AI adoption in cybersecurity, driven by … 3 weeks, 2 days ago | venturebeat.com

adoption ai ai adoption business +22

Defending against IoT ransomware attacks in a zero-trust world 3 weeks, 3 days ago | venturebeat.com

agency attacks cisa cloud security posture management (cspm) +31

Microsoft expands Priva suite to tackle evolving privacy landscape 3 weeks, 3 days ago | venturebeat.com

ai ai-powered assessment automated +30

Can generative AI help address the cybersecurity resource gap? 3 weeks, 6 days ago | venturebeat.com

address ai can cloud and data storage security +22

SydeLabs raises $2.5M seed to develop an intent-based firewall guard for AI 4 weeks, 1 day ago | venturebeat.com

adoption ai ai application security ai safety +30

Unleashing generative AI’s power: Secure the future of your enterprise at the Atlanta AI Impact … 4 weeks, 1 day ago | venturebeat.com

ai ai impact tour ally ally financial +36

Hacking internal AI chatbots with ASCII art is a security team’s worst nightmare 4 weeks, 1 day ago | venturebeat.com

ai ai chatbots art artprompt +32

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

View on infosec-jobs.com

Director, Cybersecurity - Governance, Risk and Compliance (GRC)

@ Stanley Black & Decker | New Britain CT USA - 1000 Stanley Dr

View on infosec-jobs.com

Information Security Risk Metrics Lead

@ Live Nation Entertainment | Work At Home-Connecticut

View on infosec-jobs.com

IT Product Owner - Enterprise DevSec Platform (d/f/m)

@ Airbus | Hamburg - Finkenwerder

View on infosec-jobs.com

Senior Information Security Specialist

@ Arthur Grand Technologies Inc | Arlington, VA, United States

View on infosec-jobs.com

Information Security Controls SME

@ Sword | Aberdeen, Scotland, United Kingdom

View on infosec-jobs.com

View more jobs

all InfoSec news

New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

More from venturebeat.com / Security – VentureBeat

Jobs in InfoSec / Cybersecurity

SOC 2 Manager, Audit and Certification

Director, Cybersecurity - Governance, Risk and Compliance (GRC)

Information Security Risk Metrics Lead

IT Product Owner - Enterprise DevSec Platform (d/f/m)

Senior Information Security Specialist

Information Security Controls SME