Exploring Safety Generalization Challenges of Large Language Models via Code | allinfosecnews.com

March 13, 2024, 4:11 a.m. | Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma

cs.CR updates on arXiv.org arxiv.org

arXiv:2403.07865v1 Announce Type: cross
Abstract: The rapid advancement of Large Language Models (LLMs) has brought about remarkable capabilities in natural language processing but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the …

advancement arxiv capabilities challenges code cs.ai cs.cl cs.cr cs.lg cs.se feedback fine-tuning focus human language language models large llms natural natural language natural language processing rapid safety strategies

More from arxiv.org / cs.CR updates on arXiv.org

Causal Inference with Differentially Private (Clustered) Outcomes 20 hours ago | arxiv.org

algorithm arxiv cs.cr cs.lg +12

An artificial neural network approach to finding the key length of the Vigen\`{e}re cipher 20 hours ago | arxiv.org

accuracy article artificial arxiv +9

Generic Selfish Mining MDP for DAG Protocols 20 hours ago | arxiv.org

analysis arxiv bitcoin breaking +15

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response 20 hours ago | arxiv.org

algorithms arxiv cs.cr data +14

Succinct arguments for QMA from standard assumptions via compiled nonlocal games 20 hours ago | arxiv.org

argument arxiv building crypto +8

On Training a Neural Network to Explain Binaries 20 hours ago | arxiv.org

aid arxiv binary code +15

Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning 20 hours ago | arxiv.org

arxiv attacks backdoor backdoor attacks +14

Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning 20 hours ago | arxiv.org

arxiv attack attacks cs.cr +16

An Extensive Survey of Digital Image Steganography: State of the Art 20 hours ago | arxiv.org

adoption art arxiv attention +21

Azure DevSecOps Cloud Engineer II

@ Prudent Technology | McLean, VA, USA

View on infosec-jobs.com

Security Engineer III - Python, AWS

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India

View on infosec-jobs.com

SOC Analyst (Threat Hunter)

@ NCS | Singapore, Singapore

View on infosec-jobs.com

Managed Services Information Security Manager

@ NTT DATA | Sydney, Australia

View on infosec-jobs.com

Senior Security Engineer (Remote)

@ Mattermost | United Kingdom

View on infosec-jobs.com

Penetration Tester (Part Time & Remote)

@ TestPros | United States - Remote

View on infosec-jobs.com