Feb. 16, 2024, 5:19 p.m. | Madalina Popovici

Heimdal Security Blog heimdalsecurity.com

We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through. Evan Hubinger – Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Just like the plot of Netflix’s ‘Leave the World Behind’, we’ve welcomed artificial intelligence (AI) into our homes and workplaces. It’s […]


The post You’re Saying LLMs Can Turn Nasty? A Machine Learning Engineer’s View appeared first on Heimdal Security Blog.

act alignment artificial artificial intelligence can cybersecurity news deception engineer evan found industry trends intelligence llms machine machine learning machine learning engineer malicious netflix safety training turn world

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

Cyber Security Strategy Consultant

@ Capco | New York City

Cyber Security Senior Consultant

@ Capco | Chicago, IL

Sr. Product Manager

@ MixMode | Remote, US

Security Compliance Strategist

@ Grab | Petaling Jaya, Malaysia

Cloud Security Architect, Lead

@ Booz Allen Hamilton | USA, VA, McLean (1500 Tysons McLean Dr)