Feb. 12, 2024, 5:10 a.m. | YunDa Tsai Cayon Liow Yin Sheng Siang Shou-De Lin

cs.CR updates on arXiv.org arxiv.org

This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. …

bias biases can cs.cr cs.lg data detection issue machine machine learning malicious performance real security techniques training url world

Director of the Air Force Cyber Technical Center of Excellence (CyTCoE)

@ Air Force Institute of Technology | Dayton, OH, USA

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Business Information Security Officer

@ PwC | Auckland - PwC Tower

CI/CD DevSecOps Developer (Remote)

@ NTT DATA | Halifax, NS, CA

Security Operations Engineer

@ Collectors | Santa Ana, California, United States

Security Engineer

@ Wizeline | Colombia