all InfoSec news
Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models
March 26, 2024, 4:11 a.m. | Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang
cs.CR updates on arXiv.org arxiv.org
Abstract: Web user data plays a central role in the ecosystem of pre-trained large language models (LLMs) and their fine-tuned variants. Billions of data are crawled from the web and fed to LLMs. How can \textit{\textbf{everyday web users}} confirm if LLMs misuse their data without permission? In this work, we suggest that users repeatedly insert personal passphrases into their documents, enabling LLMs to memorize them. These concealed passphrases in user documents, referred to as \textit{ghost sentences}, …
arxiv can confirm copyright crawled cs.cl cs.cr cs.ir cs.lg data ecosystem fed ghost language language models large llms role the web tool user data web
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Azure DevSecOps Cloud Engineer II
@ Prudent Technology | McLean, VA, USA
Security Engineer III - Python, AWS
@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India
SOC Analyst (Threat Hunter)
@ NCS | Singapore, Singapore
Managed Services Information Security Manager
@ NTT DATA | Sydney, Australia
Senior Security Engineer (Remote)
@ Mattermost | United Kingdom
Penetration Tester (Part Time & Remote)
@ TestPros | United States - Remote