all InfoSec news
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
April 23, 2024, 4:10 a.m. | Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil,
cs.CR updates on arXiv.org arxiv.org
Abstract: Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved …
arxiv benchmark capabilities cs.cr cs.lg cybersecurity evaluation injection language language models large llm llms llm security measure novel prompt prompt injection risks security security risks testing
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Senior - Penetration Tester
@ Deloitte | Madrid, España
Associate Cyber Incident Responder
@ Highmark Health | PA, Working at Home - Pennsylvania
Senior Insider Threat Analyst
@ IT Concepts Inc. | Woodlawn, Maryland, United States