all InfoSec news
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing. (arXiv:2301.10412v1 [cs.CL])
cs.CR updates on arXiv.org arxiv.org
Deep neural networks (DNNs) and natural language processing (NLP) systems
have developed rapidly and have been widely used in various real-world fields.
However, they have been shown to be vulnerable to backdoor attacks.
Specifically, the adversary injects a backdoor into the model during the
training phase, so that input samples with backdoor triggers are classified as
the target class. Some attacks have achieved high attack success rates on the
pre-trained language models (LMs), but there have yet to be effective …
adversary attack attacks backdoor backdoor attacks class classified detection high input language language models natural language natural language processing networks neural networks nlp systems target testing training vulnerable world