all InfoSec news
FunctionMarker: Watermarking Language Datasets via Knowledge Injection. (arXiv:2311.09535v2 [cs.CR] UPDATED)
cs.CR updates on arXiv.org arxiv.org
Large Language Models (LLMs) have demonstrated superior performance in
various natural language processing tasks. Meanwhile, they require extensive
training data, raising concerns related to dataset copyright protection.
Backdoor-based watermarking is a viable approach to protect the copyright of
classification datasets. However, these methods may introduce malicious
misclassification behaviors into watermarked LLMs by attackers and also affect
the semantic information of the watermarked text. To address these issues, we
propose FunctionMarker, a novel copyright protection method for language
datasets via knowledge …
backdoor classification copyright copyright protection data dataset datasets injection knowledge language language models large llms malicious may natural natural language natural language processing performance protect protection training training data watermarking