On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?. (arXiv:2310.01581v1 [cs.LG]) | allinfosecnews.com

Oct. 4, 2023, 1:21 a.m. | Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu

cs.CR updates on arXiv.org arxiv.org

Large Language Models (LLMs) have achieved unprecedented performance in
Natural Language Generation (NLG) tasks. However, many existing studies have
shown that they could be misused to generate undesired content. In response,
before releasing LLMs for public access, model developers usually align those
language models through Supervised Fine-Tuning (SFT) or Reinforcement Learning
with Human Feedback (RLHF). Consequently, those aligned large language models
refuse to generate undesired content when facing potentially harmful/unethical
requests. A natural question is "could alignment really prevent those …

access alignment developers language language models large llms natural natural language performance public response safety studies unprecedented

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 12 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 12 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 12 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 12 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 12 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 12 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 12 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 12 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 12 hours ago | arxiv.org

analysis arxiv can capabilities +17

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Intern, Cyber Security Vulnerability Management

@ Grab | Petaling Jaya, Malaysia

View on infosec-jobs.com

Compliance - Global Privacy Office - Associate - Bengaluru

@ Goldman Sachs | Bengaluru, Karnataka, India

View on infosec-jobs.com

Cyber Security Engineer (m/w/d) Operational Technology

@ MAN Energy Solutions | Oberhausen, DE, 46145

View on infosec-jobs.com

Armed Security Officer - Hospital

@ Allied Universal | Sun Valley, CA, United States

View on infosec-jobs.com

Governance, Risk and Compliance Officer (Africa)

@ dLocal | Lagos (Remote)

View on infosec-jobs.com