all InfoSec news
Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs
April 12, 2024, 4:10 a.m. | Bibek Upadhayay, Vahid Behzadan
cs.CR updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) are increasingly being developed and applied, but their widespread use faces challenges. These include aligning LLMs' responses with human values to prevent harmful outputs, which is addressed through safety training methods. Even so, bad actors and malicious users have succeeded in attempts to manipulate the LLMs to generate misaligned responses for harmful questions such as methods to create a bomb in school labs, recipes for harmful drugs, and ways to evade …
arxiv attack bad bad actors challenges cs.ai cs.cl cs.cr human human values language language models large llms malicious prevent safety sandwich training
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
SOC 2 Manager, Audit and Certification
@ Deloitte | US and CA Multiple Locations
Cloud Security Engineer
@ Gainwell Technologies | Any city, OR, US, 99999
Federal Workday Security Lead
@ Accenture Federal Services | Arlington, VA
Workplace Consultant
@ Solvinity | Den Bosch, Noord-Brabant, Nederland
SrMgr-Global Information Security - Security Risk Management
@ Marriott International | Bethesda, MD, United States
Sr. Security Engineer - Data Loss Prevention
@ Verisk | Jersey City, NJ, United States