Feb. 15, 2024, 5:10 a.m. | Matthieu Meeus, Igor Shilov, Manuel Faysse, Yves-Alexandre de Montjoye

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.09363v1 Announce Type: cross
Abstract: Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being very actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize a lot, we hypothesize--and later confirm--that they will not work …

access arxiv box copyright cs.cl cs.cr document fair fair use language language models large llms piece questions sota task train training

CyberSOC Technical Lead

@ Integrity360 | Sandyford, Dublin, Ireland

Cyber Security Strategy Consultant

@ Capco | New York City

Cyber Security Senior Consultant

@ Capco | Chicago, IL

Sr. Product Manager

@ MixMode | Remote, US

Corporate Intern - Information Security (Year Round)

@ Associated Bank | US WI Remote

Senior Offensive Security Engineer

@ CoStar Group | US-DC Washington, DC