all InfoSec news
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks
April 16, 2024, 4:11 a.m. | Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, Gianluca Stringhini
cs.CR updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigative dimensions using our …
arxiv automated benchmarks bugs can cs.cr evaluation framework identify investigation language language models large llms repair security vulnerabilities vulnerability
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Social Engineer For Reverse Engineering Exploit Study
@ Independent study | Remote
Application Security Engineer - Remote Friendly
@ Unit21 | San Francisco,CA; New York City; Remote USA;
Cloud Security Specialist
@ AppsFlyer | Herzliya
Malware Analysis Engineer - Canberra, Australia
@ Apple | Canberra, Australian Capital Territory, Australia
Product CISO
@ Fortinet | Sunnyvale, CA, United States
Manager, Security Engineering
@ Thrive | United States - Remote