Feb. 27, 2024, 5:11 a.m. | Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.16717v1 Announce Type: cross
Abstract: Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks …

adversarial arxiv attacks challenge cs.ai cs.cl cs.cr encryption ethical framework intent jailbreaking language language models large llms mechanism protocols recognition safety security

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

Security Engineer II- Full stack Java with React

@ JPMorgan Chase & Co. | Hyderabad, Telangana, India

Cybersecurity SecOps

@ GFT Technologies | Mexico City, MX, 11850

Senior Information Security Advisor

@ Sun Life | Sun Life Toronto One York

Contract Special Security Officer (CSSO) - Top Secret Clearance

@ SpaceX | Hawthorne, CA

Early Career Cyber Security Operations Center (SOC) Analyst

@ State Street | Quincy, Massachusetts