May 22, 2023, 1:10 a.m. | Xuanli He, Qiongkai Xu, Jun Wang, Benjamin Rubinstein, Trevor Cohn

cs.CR updates on arXiv.org arxiv.org

Modern NLP models are often trained over large untrusted datasets, raising
the potential for a malicious adversary to compromise model behaviour. For
instance, backdoors can be implanted through crafting training instances with a
specific textual trigger and a target label. This paper posits that backdoor
poisoning attacks exhibit spurious correlation between simple text features and
classification labels, and accordingly, proposes methods for mitigating
spurious correlation as means of defence. Our empirical study reveals that the
malicious triggers are highly correlated …

adversary attacks backdoor backdoors compromise correlation datasets instance large malicious nlp poisoning target training trigger untrusted

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Cyber Security Cloud Solution Architect

@ Microsoft | London, London, United Kingdom

Compliance Program Analyst

@ SailPoint | United States

Software Engineer III, Infrastructure, Google Cloud Security and Privacy

@ Google | Sunnyvale, CA, USA

Cryptography Expert

@ Raiffeisen Bank Ukraine | Kyiv, Kyiv city, Ukraine

Senior Cyber Intelligence Planner (15.09)

@ OCT Consulting, LLC | Washington, District of Columbia, United States