Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack | allinfosecnews.com

June 18, 2024, 4:19 a.m. | Shangqing Tu, Zhuoran Pan, Wenxuan Wang, Zhexin Zhang, Yuliang Sun, Jifan Yu, Hongning Wang, Lei Hou, Juanzi Li

cs.CR updates on arXiv.org arxiv.org

arXiv:2406.11682v1 Announce Type: cross
Abstract: Large language models (LLMs) have been increasingly applied to various domains, which triggers increasing concerns about LLMs' safety on specialized domains, e.g. medicine. However, testing the domain-specific safety of LLMs is challenging due to the lack of domain knowledge-driven attacks in existing benchmarks. To bridge this gap, we propose a new task, knowledge-to-jailbreak, which aims to generate jailbreaks from domain knowledge to evaluate the safety of LLMs when applied to those domains. We collect a …

arxiv attack cs.ai cs.cl cs.cr jailbreak knowledge point

More from arxiv.org / cs.CR updates on arXiv.org

SoK: Facial Deepfake Detectors 12 hours ago | arxiv.org

arxiv cs.cr cs.cv cs.lg +19

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration 12 hours ago | arxiv.org

aim arxiv attacks calibration +16

A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models 12 hours ago | arxiv.org

arxiv attack classification cs.ai +11

Locally Differentially Private Distributed Online Learning with Guaranteed Optimality 12 hours ago | arxiv.org

address algorithms arxiv awareness +19

A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models 12 hours ago | arxiv.org

arxiv challenge covert cs.cl +18

Detecting Misuse of Security APIs: A Systematic Review 12 hours ago | arxiv.org

api api design apis application +25

Privacy Preserving Reinforcement Learning for Population Processes 12 hours ago | arxiv.org

algorithm algorithms arxiv control +8

Video Inpainting Localization with Contrastive Learning 12 hours ago | arxiv.org

arxiv cs.cr cs.cv localization +1

CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems 12 hours ago | arxiv.org

actions adversarial adversarial attacks agent +13

Information Technology Specialist I: Windows Engineer

@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, California

View on infosec-jobs.com

Information Technology Specialist I, LACERA: Information Security Engineer

@ Los Angeles County Employees Retirement Association (LACERA) | Pasadena, CA

View on infosec-jobs.com

Vice President, Controls Design & Development-7

@ State Street | Quincy, Massachusetts

View on infosec-jobs.com

Vice President, Controls Design & Development-5

@ State Street | Quincy, Massachusetts

View on infosec-jobs.com

Data Scientist & AI Prompt Engineer

@ Varonis | Israel

View on infosec-jobs.com

Contractor

@ Birlasoft | INDIA - MUMBAI - BIRLASOFT OFFICE, IN

View on infosec-jobs.com