RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content | allinfosecnews.com

March 21, 2024, 4:10 a.m. | Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li

cs.CR updates on arXiv.org arxiv.org

arXiv:2403.13031v1 Announce Type: new
Abstract: Recent advancements in Large Language Models (LLMs) have showcased remarkable capabilities across various tasks in different domains. However, the emergence of biases and the potential for generating harmful content in LLMs, particularly under malicious inputs, pose significant challenges. Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently and effectively moderate harmful and unsafe inputs and outputs for …

arxiv biases capabilities challenges cs.ai cs.cl cs.cr cs.lg current domains guardrails inputs language language models large llms malicious mitigation mitigation strategies resilient strategies under

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 7 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 7 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 7 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 7 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 7 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 7 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 7 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 7 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 7 hours ago | arxiv.org

analysis arxiv can capabilities +17

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

SITEC- Systems Security Administrator- Camp HM Smith

@ Peraton | Camp H.M. Smith, HI, United States

View on infosec-jobs.com

Cyberspace Intelligence Analyst

@ Peraton | Fort Meade, MD, United States

View on infosec-jobs.com

General Manager, Cybersecurity, Google Public Sector

@ Google | Virginia, USA; United States

View on infosec-jobs.com

Cyber Security Advisor

@ H&M Group | Stockholm, Sweden

View on infosec-jobs.com

Engineering Team Manager – Security Controls

@ H&M Group | Stockholm, Sweden

View on infosec-jobs.com