all InfoSec news
Inherent Challenges of Post-Hoc Membership Inference for Large Language Models
June 27, 2024, 4:19 a.m. | Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye
cs.CR updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) are often trained on vast amounts of undisclosed data, motivating the development of post-hoc Membership Inference Attacks (MIAs) to gain insight into their training data composition. However, in this paper, we identify inherent challenges in post-hoc MIA evaluation due to potential distribution shifts between collected member and non-member datasets. Using a simple bag-of-words classifier, we demonstrate that datasets used in recent post-hoc MIAs suffer from significant distribution shifts, in some cases …
arxiv attacks challenges cs.cl cs.cr cs.lg data development evaluation identify insight language language models large llms training training data vast
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Senior Systems Engineer - AWS
@ CACI International Inc | 999 REMOTE
Managing Consultant / Consulting Director / Engagement Lead in Cybersecurity Consulting
@ Marsh McLennan | Toronto - Bremner
Specialist , Fraud Investigation and SecOps
@ Concentrix | Bulgaria - Work at Home
Data Engineer, Mid
@ Booz Allen Hamilton | USA, CA, San Diego (1615 Murray Canyon Rd)
Manager, Risk Management
@ Manulife | CAN, Ontario, Toronto, 200 Bloor Street East
Regional Channel Manager (Remote - West)
@ Dell Technologies | Remote - California, United States (All Other)