all InfoSec news
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Feb. 26, 2024, 5:11 a.m. | Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee
cs.CR updates on arXiv.org arxiv.org
Abstract: In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or …
arxiv can cs.cl cs.cr ethical guardrails jailbreaking language language models large llms producing safety study vulnerabilities
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Social Engineer For Reverse Engineering Exploit Study
@ Independent study | Remote
Cloud Security Analyst
@ Cloud Peritus | Bengaluru, India
Cyber Program Manager - CISO- United States – Remote
@ Stanley Black & Decker | Towson MD USA - 701 E Joppa Rd Bg 700
Network Security Engineer (AEGIS)
@ Peraton | Virginia Beach, VA, United States
SC2022-002065 Cyber Security Incident Responder (NS) - MON 13 May
@ EMW, Inc. | Mons, Wallonia, Belgium
Information Systems Security Engineer
@ Booz Allen Hamilton | USA, GA, Warner Robins (300 Park Pl Dr)