Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs | allinfosecnews.com

April 12, 2024, 4:10 a.m. | Bibek Upadhayay, Vahid Behzadan

cs.CR updates on arXiv.org arxiv.org

arXiv:2404.07242v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly being developed and applied, but their widespread use faces challenges. These include aligning LLMs' responses with human values to prevent harmful outputs, which is addressed through safety training methods. Even so, bad actors and malicious users have succeeded in attempts to manipulate the LLMs to generate misaligned responses for harmful questions such as methods to create a bomb in school labs, recipes for harmful drugs, and ways to evade …

arxiv attack bad bad actors challenges cs.ai cs.cl cs.cr human human values language language models large llms malicious prevent safety sandwich training

More from arxiv.org / cs.CR updates on arXiv.org

Causal Inference with Differentially Private (Clustered) Outcomes 33 minutes ago | arxiv.org

algorithm arxiv cs.cr cs.lg +12

An artificial neural network approach to finding the key length of the Vigen\`{e}re cipher 33 minutes ago | arxiv.org

accuracy article artificial arxiv +9

Generic Selfish Mining MDP for DAG Protocols 33 minutes ago | arxiv.org

analysis arxiv bitcoin breaking +15

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response 33 minutes ago | arxiv.org

algorithms arxiv cs.cr data +14

Succinct arguments for QMA from standard assumptions via compiled nonlocal games 33 minutes ago | arxiv.org

argument arxiv building crypto +8

On Training a Neural Network to Explain Binaries 33 minutes ago | arxiv.org

aid arxiv binary code +15

Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning 33 minutes ago | arxiv.org

arxiv attacks backdoor backdoor attacks +14

Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning 33 minutes ago | arxiv.org

arxiv attack attacks cs.cr +16

An Extensive Survey of Digital Image Steganography: State of the Art 33 minutes ago | arxiv.org

adoption art arxiv attention +21

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

View on infosec-jobs.com

Cloud Security Engineer

@ Gainwell Technologies | Any city, OR, US, 99999

View on infosec-jobs.com

Federal Workday Security Lead

@ Accenture Federal Services | Arlington, VA

View on infosec-jobs.com

Workplace Consultant

@ Solvinity | Den Bosch, Noord-Brabant, Nederland

View on infosec-jobs.com

SrMgr-Global Information Security - Security Risk Management

@ Marriott International | Bethesda, MD, United States

View on infosec-jobs.com

Sr. Security Engineer - Data Loss Prevention

@ Verisk | Jersey City, NJ, United States

View on infosec-jobs.com