Feb. 9, 2024, 5:10 a.m. | Sophie Xhonneux David Dobre Jian Tang Gauthier Gidel Dhanya Sridhar

cs.CR updates on arXiv.org arxiv.org

Despite significant investment into safety training, large language models (LLMs) deployed in the real world still suffer from numerous vulnerabilities. One perspective on LLM safety training is that it algorithmically forbids the model from answering toxic or harmful queries. To assess the effectiveness of safety training, in this work, we study forbidden tasks, i.e., tasks the model is designed to refuse to answer. Specifically, we investigate whether in-context learning (ICL) can be used to re-learn forbidden tasks despite the explicit …

can context cs.cr cs.lg forbidden investment language language models large learn llm llms perspective real safety study toxic training vulnerabilities work world

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

Application Security Engineer - Remote Friendly

@ Unit21 | San Francisco,CA; New York City; Remote USA;

Cloud Security Specialist

@ AppsFlyer | Herzliya

Malware Analysis Engineer - Canberra, Australia

@ Apple | Canberra, Australian Capital Territory, Australia

Product CISO

@ Fortinet | Sunnyvale, CA, United States

Manager, Security Engineering

@ Thrive | United States - Remote