June 27, 2024, 4:19 a.m. | Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye

cs.CR updates on arXiv.org arxiv.org

arXiv:2406.17975v1 Announce Type: cross
Abstract: Large Language Models (LLMs) are often trained on vast amounts of undisclosed data, motivating the development of post-hoc Membership Inference Attacks (MIAs) to gain insight into their training data composition. However, in this paper, we identify inherent challenges in post-hoc MIA evaluation due to potential distribution shifts between collected member and non-member datasets. Using a simple bag-of-words classifier, we demonstrate that datasets used in recent post-hoc MIAs suffer from significant distribution shifts, in some cases …

arxiv attacks challenges cs.cl cs.cr cs.lg data development evaluation identify insight language language models large llms training training data vast

Senior Systems Engineer - AWS

@ CACI International Inc | 999 REMOTE

Managing Consultant / Consulting Director / Engagement Lead in Cybersecurity Consulting

@ Marsh McLennan | Toronto - Bremner

Specialist , Fraud Investigation and SecOps

@ Concentrix | Bulgaria - Work at Home

Data Engineer, Mid

@ Booz Allen Hamilton | USA, CA, San Diego (1615 Murray Canyon Rd)

Manager, Risk Management

@ Manulife | CAN, Ontario, Toronto, 200 Bloor Street East

Regional Channel Manager (Remote - West)

@ Dell Technologies | Remote - California, United States (All Other)