LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks | allinfosecnews.com

April 16, 2024, 4:11 a.m. | Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, Gianluca Stringhini

cs.CR updates on arXiv.org arxiv.org

arXiv:2312.12575v2 Announce Type: replace
Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigative dimensions using our …

arxiv automated benchmarks bugs can cs.cr evaluation framework identify investigation language language models large llms repair security vulnerabilities vulnerability

More from arxiv.org / cs.CR updates on arXiv.org

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots 3 hours ago | arxiv.org

adoption adversarial adversarial attacks artificial +22

Sui Lutris: A Blockchain Combining Broadcast and Consensus 3 hours ago | arxiv.org

agreement arxiv blockchain broadcast +12

Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback 3 hours ago | arxiv.org

arxiv asynchronous blockchains clear +19

Noisy Measurements Are Important, the Design of Census Products Is Much More Important 3 hours ago | arxiv.org

arxiv asking august call +19

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 3 hours ago | arxiv.org

analysis arxiv assessment attack +31

REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption 3 hours ago | arxiv.org

accelerator accelerators address application +14

Evaluation Methodologies in Software Protection Research 3 hours ago | arxiv.org

arms arxiv assets attackers +20

SoK: Rowhammer on Commodity Operating Systems 3 hours ago | arxiv.org

academia access arxiv attacks +17

Quantum cryptographic protocols with dual messaging system via 2D alternate quantum walks and genuine single … 3 hours ago | arxiv.org

alternate arxiv can cond-mat.dis-nn +17

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

View on infosec-jobs.com

Application Security Engineer - Remote Friendly

@ Unit21 | San Francisco,CA; New York City; Remote USA;

View on infosec-jobs.com

Cloud Security Specialist

@ AppsFlyer | Herzliya

View on infosec-jobs.com

Malware Analysis Engineer - Canberra, Australia

@ Apple | Canberra, Australian Capital Territory, Australia

View on infosec-jobs.com

Product CISO

@ Fortinet | Sunnyvale, CA, United States

View on infosec-jobs.com

Manager, Security Engineering

@ Thrive | United States - Remote

View on infosec-jobs.com