How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries | allinfosecnews.com

Feb. 26, 2024, 5:11 a.m. | Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.15302v1 Announce Type: cross
Abstract: In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or …

arxiv can cs.cl cs.cr ethical guardrails jailbreaking language language models large llms producing safety study vulnerabilities

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 2 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 2 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 2 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 2 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 2 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 2 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 2 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 2 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 2 hours ago | arxiv.org

analysis arxiv can capabilities +17

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Cloud Security Analyst

@ Cloud Peritus | Bengaluru, India

View on infosec-jobs.com

Cyber Program Manager - CISO- United States – Remote

@ Stanley Black & Decker | Towson MD USA - 701 E Joppa Rd Bg 700

View on infosec-jobs.com

Network Security Engineer (AEGIS)

@ Peraton | Virginia Beach, VA, United States

View on infosec-jobs.com

SC2022-002065 Cyber Security Incident Responder (NS) - MON 13 May

@ EMW, Inc. | Mons, Wallonia, Belgium

View on infosec-jobs.com

Information Systems Security Engineer

@ Booz Allen Hamilton | USA, GA, Warner Robins (300 Park Pl Dr)

View on infosec-jobs.com