CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models | allinfosecnews.com

Feb. 27, 2024, 5:11 a.m. | Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.16717v1 Announce Type: cross
Abstract: Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks …

adversarial arxiv attacks challenge cs.ai cs.cl cs.cr encryption ethical framework intent jailbreaking language language models large llms mechanism protocols recognition safety security

More from arxiv.org / cs.CR updates on arXiv.org

Causal Inference with Differentially Private (Clustered) Outcomes 22 hours ago | arxiv.org

algorithm arxiv cs.cr cs.lg +12

An artificial neural network approach to finding the key length of the Vigen\`{e}re cipher 22 hours ago | arxiv.org

accuracy article artificial arxiv +9

Generic Selfish Mining MDP for DAG Protocols 22 hours ago | arxiv.org

analysis arxiv bitcoin breaking +15

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response 22 hours ago | arxiv.org

algorithms arxiv cs.cr data +14

Succinct arguments for QMA from standard assumptions via compiled nonlocal games 22 hours ago | arxiv.org

argument arxiv building crypto +8

On Training a Neural Network to Explain Binaries 22 hours ago | arxiv.org

aid arxiv binary code +15

Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning 22 hours ago | arxiv.org

arxiv attacks backdoor backdoor attacks +14

Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning 22 hours ago | arxiv.org

arxiv attack attacks cs.cr +16

An Extensive Survey of Digital Image Steganography: State of the Art 22 hours ago | arxiv.org

adoption art arxiv attention +21

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Security Engineer II- Full stack Java with React

@ JPMorgan Chase & Co. | Hyderabad, Telangana, India

View on infosec-jobs.com

Cybersecurity SecOps

@ GFT Technologies | Mexico City, MX, 11850

View on infosec-jobs.com

Senior Information Security Advisor

@ Sun Life | Sun Life Toronto One York

View on infosec-jobs.com

Contract Special Security Officer (CSSO) - Top Secret Clearance

@ SpaceX | Hawthorne, CA

View on infosec-jobs.com

Early Career Cyber Security Operations Center (SOC) Analyst

@ State Street | Quincy, Massachusetts

View on infosec-jobs.com