all InfoSec news
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
April 23, 2024, 4:10 a.m. | Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil,
cs.CR updates on arXiv.org arxiv.org
Abstract: Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved …
arxiv benchmark capabilities cs.cr cs.lg cybersecurity evaluation injection language language models large llm llms llm security measure novel prompt prompt injection risks security security risks testing
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Intern, Cyber Security Vulnerability Management
@ Grab | Petaling Jaya, Malaysia
Compliance - Global Privacy Office - Associate - Bengaluru
@ Goldman Sachs | Bengaluru, Karnataka, India
Cyber Security Engineer (m/w/d) Operational Technology
@ MAN Energy Solutions | Oberhausen, DE, 46145
Armed Security Officer - Hospital
@ Allied Universal | Sun Valley, CA, United States
Governance, Risk and Compliance Officer (Africa)
@ dLocal | Lagos (Remote)
Junior Cloud DevSecOps Network Engineer
@ Accenture Federal Services | Arlington, VA