Poisoning AI Models | allinfosecnews.com

Jan. 24, 2024, 12:06 p.m. | Bruce Schneier

Schneier on Security www.schneier.com

New research into poisoning AI models:

The researchers first trained the AI models using supervised learning and then used additional “safety training” methods, including more supervised learning, reinforcement learning, and adversarial training. After this, they checked if the AI still had hidden behaviors. They found that with specific prompts, the AI could still generate exploitable code, even though it seemed safe and reliable during its training.

During stage 2, Anthropic applied reinforcement learning and supervised fine-tuning to the three …

academic papers adversarial ai models artificial intelligence code found hidden llm machine learning poisoning prompts research researchers safety threat models training

More from www.schneier.com / Schneier on Security

Friday Squid Blogging: Squid Purses 4 hours ago | www.schneier.com

blog blogging can guidelines +6

My TED Talks 7 hours ago | www.schneier.com

conferences controls data internet +9

Rare Interviews with Enigma Cryptanalyst Marian Rejewski 14 hours ago | www.schneier.com

crack cryptanalysis enigma history of cryptography +5

The UK Bans Default Passwords 1 day, 14 hours ago | www.schneier.com

act and telecommunications ban bans +19

AI Voice Scam 2 days, 14 hours ago | www.schneier.com

artificial intelligence bbc money scam +4

WhatsApp in India 3 days, 14 hours ago | www.schneier.com

courts encryption end end-to-end +5

Whale Song Code 4 days, 14 hours ago | www.schneier.com

basic broadcast code cold +18

Friday Squid Blogging: Searching for the Colossal Squid 1 week ago | www.schneier.com

blog blogging can cruise +7

Long Article on GM Spying on Its Cars’ Drivers 1 week ago | www.schneier.com

article cars companies data +10

Senior Security Engineer - Detection and Response

@ Fastly, Inc. | US (Remote)

View on infosec-jobs.com

Application Security Engineer

@ Solidigm | Zapopan, Mexico

View on infosec-jobs.com

Defensive Cyber Operations Engineer-Mid

@ ISYS Technologies | Aurora, CO, United States

View on infosec-jobs.com

Manager, Information Security GRC

@ OneTrust | Atlanta, Georgia

View on infosec-jobs.com

Senior Information Security Analyst | IAM

@ EBANX | Curitiba or São Paulo

View on infosec-jobs.com

Senior Information Security Engineer, Cloud Vulnerability Research

@ Google | New York City, USA; New York, USA

View on infosec-jobs.com