all InfoSec news
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP. (arXiv:2308.02122v1 [cs.CR])
cs.CR updates on arXiv.org arxiv.org
Backdoor attacks have emerged as a prominent threat to natural language
processing (NLP) models, where the presence of specific triggers in the input
can lead poisoned models to misclassify these inputs to predetermined target
classes. Current detection mechanisms are limited by their inability to address
more covert backdoor strategies, such as style-based attacks. In this work, we
propose an innovative test-time poisoned sample detection framework that hinges
on the interpretability of model predictions, grounded in the semantic meaning
of inputs. …
address attacks backdoor backdoor attacks covert current detection input inputs language natural language natural language processing nlp presence target threat