Feb. 29, 2024, 5:11 a.m. | Tong Liu, Yingjie Zhang, Zhe Zhao, Yinpeng Dong, Guozhu Meng, Kai Chen

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.18104v1 Announce Type: new
Abstract: In recent years, large language models (LLMs) have demonstrated notable success across various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is the potential to generate toxic or harmful responses. Attackers can craft adversarial prompts that induce harmful responses from LLMs. In this work, we pioneer a theoretical foundation in LLMs security by identifying bias vulnerabilities within the safety fine-tuning and design a black-box jailbreak method named DRA (Disguise …

arxiv ask attackers can cs.ai cs.cr jailbreaking language language models large llms making problem threat toxic trustworthiness

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

Security Engineer II- Full stack Java with React

@ JPMorgan Chase & Co. | Hyderabad, Telangana, India

Cybersecurity SecOps

@ GFT Technologies | Mexico City, MX, 11850

Senior Information Security Advisor

@ Sun Life | Sun Life Toronto One York

Contract Special Security Officer (CSSO) - Top Secret Clearance

@ SpaceX | Hawthorne, CA

Early Career Cyber Security Operations Center (SOC) Analyst

@ State Street | Quincy, Massachusetts