Feb. 26, 2024, 5:11 a.m. | Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.14904v1 Announce Type: new
Abstract: This paper investigates the radioactivity of LLM-generated texts, i.e. whether it is possible to detect that such input was used as training data. Conventional methods like membership inference can carry out this detection with some level of accuracy. We show that watermarked training data leaves traces easier to detect and much more reliable than membership inference. We link the contamination level to the watermark robustness, its proportion in the training set, and the fine-tuning process. …

accuracy arxiv can cs.ai cs.cl cs.cr cs.lg data detect detection easier generated input language language models llm texts traces training training data watermarking

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

Cyber Security Strategy Consultant

@ Capco | New York City

Cyber Security Senior Consultant

@ Capco | Chicago, IL

Sr. Product Manager

@ MixMode | Remote, US

Corporate Intern - Information Security (Year Round)

@ Associated Bank | US WI Remote

Senior Offensive Security Engineer

@ CoStar Group | US-DC Washington, DC