March 25, 2024, 4:11 a.m. | Qilong Wu, Varun Chandrasekaran

cs.CR updates on arXiv.org arxiv.org

arXiv:2403.14719v1 Announce Type: new
Abstract: Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``green'') tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first ``color-aware'' attack. SCTS …

art arxiv attack aware biases bypassing color cs.cr cs.cv cs.lg generated green human identify language large large language model llm problem robustness state strategy text tokens watermarking watermarks

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Salesforce Solution Consultant

@ BeyondTrust | Remote United States

Divisional Deputy City Solicitor, Public Safety Compliance Counsel - Compliance and Legislation Unit

@ City of Philadelphia | Philadelphia, PA, United States

Security Engineer, IT IAM, EIS

@ Micron Technology | Hyderabad - Skyview, India

Security Analyst

@ Northwestern Memorial Healthcare | Chicago, IL, United States

Werkstudent Cybersecurity (m/w/d)

@ Brose Group | Bamberg, DE, 96052