all InfoSec news
Bypassing LLM Watermarks with Color-Aware Substitutions
March 25, 2024, 4:11 a.m. | Qilong Wu, Varun Chandrasekaran
cs.CR updates on arXiv.org arxiv.org
Abstract: Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``green'') tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first ``color-aware'' attack. SCTS …
art arxiv attack aware biases bypassing color cs.cr cs.cv cs.lg generated green human identify language large large language model llm problem robustness state strategy text tokens watermarking watermarks
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Consultant Sécurité SI Gouvernance - Risques - Conformité H/F - Strasbourg
@ Hifield | Strasbourg, France
Lead Security Specialist
@ KBR, Inc. | USA, Dallas, 8121 Lemmon Ave, Suite 550, Texas
Consultant SOC / CERT H/F
@ Hifield | Sèvres, France