all InfoSec news
Towards Building a Robust Toxicity Predictor
April 16, 2024, 4:11 a.m. | Dmitriy Bespalov, Sourav Bhabesh, Yi Xiang, Liutong Zhou, Yanjun Qi
cs.CR updates on arXiv.org arxiv.org
Abstract: Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies to enable fast and effective generation of toxic adversarial examples. Two novel goal function designs allow ToxicTrap to identify weaknesses …
adversarial adversarial attack arxiv attack attention building cs.ai cs.cl cs.cr cs.lg language literature nlp novel predict robustness sota systems text toxic toxicity word
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Social Engineer For Reverse Engineering Exploit Study
@ Independent study | Remote
Security Engineer II- Full stack Java with React
@ JPMorgan Chase & Co. | Hyderabad, Telangana, India
Cybersecurity SecOps
@ GFT Technologies | Mexico City, MX, 11850
Senior Information Security Advisor
@ Sun Life | Sun Life Toronto One York
Contract Special Security Officer (CSSO) - Top Secret Clearance
@ SpaceX | Hawthorne, CA
Early Career Cyber Security Operations Center (SOC) Analyst
@ State Street | Quincy, Massachusetts