Feb. 12, 2024, 5:10 a.m. | YunDa Tsai Cayon Liow Yin Sheng Siang Shou-De Lin

cs.CR updates on arXiv.org arxiv.org

This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. …

bias biases can cs.cr cs.lg data detection issue machine machine learning malicious performance real security techniques training url world

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

Cyber Security Strategy Consultant

@ Capco | New York City

Cyber Security Senior Consultant

@ Capco | Chicago, IL

Sr. Product Manager

@ MixMode | Remote, US

Corporate Intern - Information Security (Year Round)

@ Associated Bank | US WI Remote

Senior Offensive Security Engineer

@ CoStar Group | US-DC Washington, DC