Jan. 12, 2024, 10:54 p.m. | Michael Nuñez

Security – VentureBeat venturebeat.com

New study from Anthropic reveals techniques for training deceptive "sleeper agent" AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness.

agent ai ai ethics ai models ai risks ai safety anthropic artificial intelligence arxiv automation business conceal current data infrastructure data management data science machine learning ml and deep learning nlp programming & development research paper safety security security newsletter study techniques training trustworthiness vb daily newsletter

More from venturebeat.com / Security – VentureBeat

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Director, Cybersecurity - Governance, Risk and Compliance (GRC)

@ Stanley Black & Decker | New Britain CT USA - 1000 Stanley Dr

Information Security Risk Metrics Lead

@ Live Nation Entertainment | Work At Home-Connecticut

IT Product Owner - Enterprise DevSec Platform (d/f/m)

@ Airbus | Hamburg - Finkenwerder

Senior Information Security Specialist

@ Arthur Grand Technologies Inc | Arlington, VA, United States

Information Security Controls SME

@ Sword | Aberdeen, Scotland, United Kingdom