all InfoSec news
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks
April 16, 2024, 4:11 a.m. | Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, Gianluca Stringhini
cs.CR updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigative dimensions using our …
arxiv automated benchmarks bugs can cs.cr evaluation framework identify investigation language language models large llms repair security vulnerabilities vulnerability
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Security Operations Manager-West Coast
@ The Walt Disney Company | USA - CA - 2500 Broadway Street
Vulnerability Analyst - Remote (WFH)
@ Cognitive Medical Systems | Phoenix, AZ, US | Oak Ridge, TN, US | Austin, TX, US | Oregon, US | Austin, TX, US
Senior Mainframe Security Administrator
@ Danske Bank | Copenhagen V, Denmark