April 12, 2024, 4:10 a.m. | Bibek Upadhayay, Vahid Behzadan

cs.CR updates on arXiv.org arxiv.org

arXiv:2404.07242v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly being developed and applied, but their widespread use faces challenges. These include aligning LLMs' responses with human values to prevent harmful outputs, which is addressed through safety training methods. Even so, bad actors and malicious users have succeeded in attempts to manipulate the LLMs to generate misaligned responses for harmful questions such as methods to create a bomb in school labs, recipes for harmful drugs, and ways to evade …

arxiv attack bad bad actors challenges cs.ai cs.cl cs.cr human human values language language models large llms malicious prevent safety sandwich training

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Cloud Security Engineer

@ Gainwell Technologies | Any city, OR, US, 99999

Federal Workday Security Lead

@ Accenture Federal Services | Arlington, VA

Workplace Consultant

@ Solvinity | Den Bosch, Noord-Brabant, Nederland

SrMgr-Global Information Security - Security Risk Management

@ Marriott International | Bethesda, MD, United States

Sr. Security Engineer - Data Loss Prevention

@ Verisk | Jersey City, NJ, United States