all InfoSec news
Bypassing LLM Watermarks with Color-Aware Substitutions
March 25, 2024, 4:11 a.m. | Qilong Wu, Varun Chandrasekaran
cs.CR updates on arXiv.org arxiv.org
Abstract: Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``green'') tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first ``color-aware'' attack. SCTS …
art arxiv attack aware biases bypassing color cs.cr cs.cv cs.lg generated green human identify language large large language model llm problem robustness state strategy text tokens watermarking watermarks
More from arxiv.org / cs.CR updates on arXiv.org
IDEA: Invariant Defense for Graph Adversarial Robustness
1 day, 20 hours ago |
arxiv.org
FairCMS: Cloud Media Sharing with Fair Copyright Protection
1 day, 20 hours ago |
arxiv.org
Efficient unitary designs and pseudorandom unitaries from permutations
1 day, 20 hours ago |
arxiv.org
Jobs in InfoSec / Cybersecurity
SOC 2 Manager, Audit and Certification
@ Deloitte | US and CA Multiple Locations
Salesforce Solution Consultant
@ BeyondTrust | Remote United States
Divisional Deputy City Solicitor, Public Safety Compliance Counsel - Compliance and Legislation Unit
@ City of Philadelphia | Philadelphia, PA, United States
Security Engineer, IT IAM, EIS
@ Micron Technology | Hyderabad - Skyview, India
Security Analyst
@ Northwestern Memorial Healthcare | Chicago, IL, United States
Werkstudent Cybersecurity (m/w/d)
@ Brose Group | Bamberg, DE, 96052