April 16, 2024, 4:11 a.m. | Dmitriy Bespalov, Sourav Bhabesh, Yi Xiang, Liutong Zhou, Yanjun Qi

cs.CR updates on arXiv.org arxiv.org

arXiv:2404.08690v1 Announce Type: cross
Abstract: Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies to enable fast and effective generation of toxic adversarial examples. Two novel goal function designs allow ToxicTrap to identify weaknesses …

adversarial adversarial attack arxiv attack attention building cs.ai cs.cl cs.cr cs.lg language literature nlp novel predict robustness sota systems text toxic toxicity word

Information Security Engineers

@ D. E. Shaw Research | New York City

Technology Security Analyst

@ Halton Region | Oakville, Ontario, Canada

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Sr. Staff Firmware Engineer – Networking & Firewall

@ Axiado | Bengaluru, India

Compliance Architect / Product Security Sr. Engineer/Expert (f/m/d)

@ SAP | Walldorf, DE, 69190

SAP Security Administrator

@ FARO Technologies | EMEA-Portugal