all InfoSec news
Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation. (arXiv:2305.11596v1 [cs.CL])
cs.CR updates on arXiv.org arxiv.org
Modern NLP models are often trained over large untrusted datasets, raising
the potential for a malicious adversary to compromise model behaviour. For
instance, backdoors can be implanted through crafting training instances with a
specific textual trigger and a target label. This paper posits that backdoor
poisoning attacks exhibit spurious correlation between simple text features and
classification labels, and accordingly, proposes methods for mitigating
spurious correlation as means of defence. Our empirical study reveals that the
malicious triggers are highly correlated …
adversary attacks backdoor backdoors compromise correlation datasets instance large malicious nlp poisoning target training trigger untrusted