all InfoSec news
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. (arXiv:2304.00409v1 [cs.CR])
cs.CR updates on arXiv.org arxiv.org
We propose and release a new vulnerable source code dataset. We curate the
dataset by crawling security issue websites, extracting vulnerability-fixing
commits and source codes from the corresponding projects. Our new dataset
contains 150 CWEs, 26,635 vulnerable functions, and 352,606 non-vulnerable
functions extracted from 7,861 commits. Our dataset covers 305 more projects
than all previous datasets combined. We show that increasing the diversity and
volume of training data improves the performance of deep learning models for
vulnerability detection.
Combining our …
code data datasets deep learning detection diversity functions issue non performance projects release security source code training vulnerability vulnerability detection vulnerable websites