March 26, 2024, 4:11 a.m. | Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang

cs.CR updates on arXiv.org arxiv.org

arXiv:2403.15740v1 Announce Type: cross
Abstract: Web user data plays a central role in the ecosystem of pre-trained large language models (LLMs) and their fine-tuned variants. Billions of data are crawled from the web and fed to LLMs. How can \textit{\textbf{everyday web users}} confirm if LLMs misuse their data without permission? In this work, we suggest that users repeatedly insert personal passphrases into their documents, enabling LLMs to memorize them. These concealed passphrases in user documents, referred to as \textit{ghost sentences}, …

arxiv can confirm copyright crawled cs.cl cs.cr cs.ir cs.lg data ecosystem fed ghost language language models large llms role the web tool user data web

Azure DevSecOps Cloud Engineer II

@ Prudent Technology | McLean, VA, USA

Security Engineer III - Python, AWS

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India

SOC Analyst (Threat Hunter)

@ NCS | Singapore, Singapore

Managed Services Information Security Manager

@ NTT DATA | Sydney, Australia

Senior Security Engineer (Remote)

@ Mattermost | United Kingdom

Penetration Tester (Part Time & Remote)

@ TestPros | United States - Remote