all InfoSec news
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Feb. 29, 2024, 5:11 a.m. | Tong Liu, Yingjie Zhang, Zhe Zhao, Yinpeng Dong, Guozhu Meng, Kai Chen
cs.CR updates on arXiv.org arxiv.org
Abstract: In recent years, large language models (LLMs) have demonstrated notable success across various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is the potential to generate toxic or harmful responses. Attackers can craft adversarial prompts that induce harmful responses from LLMs. In this work, we pioneer a theoretical foundation in LLMs security by identifying bias vulnerabilities within the safety fine-tuning and design a black-box jailbreak method named DRA (Disguise …
arxiv ask attackers can cs.ai cs.cr jailbreaking language language models large llms making problem threat toxic trustworthiness
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Social Engineer For Reverse Engineering Exploit Study
@ Independent study | Remote
Security Engineer II- Full stack Java with React
@ JPMorgan Chase & Co. | Hyderabad, Telangana, India
Cybersecurity SecOps
@ GFT Technologies | Mexico City, MX, 11850
Senior Information Security Advisor
@ Sun Life | Sun Life Toronto One York
Contract Special Security Officer (CSSO) - Top Secret Clearance
@ SpaceX | Hawthorne, CA
Early Career Cyber Security Operations Center (SOC) Analyst
@ State Street | Quincy, Massachusetts