Nov. 20, 2023, 2:10 a.m. | Shuai Li, Kejiang Chen, Kunsheng Tang, Wen Huang, Jie Zhang, Weiming Zhang, Nenghai Yu

cs.CR updates on arXiv.org arxiv.org

Large Language Models (LLMs) have demonstrated superior performance in
various natural language processing tasks. Meanwhile, they require extensive
training data, raising concerns related to dataset copyright protection.
Backdoor-based watermarking is a viable approach to protect the copyright of
classification datasets. However, these methods may introduce malicious
misclassification behaviors into watermarked LLMs by attackers and also affect
the semantic information of the watermarked text. To address these issues, we
propose FunctionMarker, a novel copyright protection method for language
datasets via knowledge …

backdoor classification copyright copyright protection data dataset datasets injection knowledge language language models large llms malicious may natural natural language natural language processing performance protect protection training training data watermarking

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Network Security Engineer

@ Meta | Menlo Park, CA | Remote, US

Security Engineer, Investigations - i3

@ Meta | Washington, DC

Threat Investigator- Security Analyst

@ Meta | Menlo Park, CA | Seattle, WA | Washington, DC

Security Operations Engineer II

@ Microsoft | Redmond, Washington, United States

Engineering -- Tech Risk -- Global Cyber Defense & Intelligence -- Bug Bounty -- Associate -- Dallas

@ Goldman Sachs | Dallas, Texas, United States