all InfoSec news
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Feb. 23, 2024, 5:11 a.m. | Shahriar Golchin, Mihai Surdeanu
cs.CR updates on arXiv.org arxiv.org
Abstract: Data contamination, i.e., the presence of test data from downstream tasks in the training data of large language models (LLMs), is a potential major issue in measuring LLMs' real effectiveness on other tasks. We propose a straightforward yet effective method for identifying data contamination within LLMs. At its core, our approach starts by identifying potential contamination at the instance level; using this information, our approach then assesses wider contamination at the partition level. To estimate …
arxiv cs.ai cs.cl cs.cr cs.lg data issue language language models large llms major measuring presence real test tracing training training data travel
More from arxiv.org / cs.CR updates on arXiv.org
IDEA: Invariant Defense for Graph Adversarial Robustness
2 days, 7 hours ago |
arxiv.org
FairCMS: Cloud Media Sharing with Fair Copyright Protection
2 days, 7 hours ago |
arxiv.org
Efficient unitary designs and pseudorandom unitaries from permutations
2 days, 7 hours ago |
arxiv.org
Jobs in InfoSec / Cybersecurity
SOC 2 Manager, Audit and Certification
@ Deloitte | US and CA Multiple Locations
Security Officer Hospital Laguna Beach
@ Allied Universal | Laguna Beach, CA, United States
Sr. Cloud DevSecOps Engineer
@ Oracle | NOIDA, UTTAR PRADESH, India
Cloud Operations Security Engineer
@ Elekta | Crawley - Cornerstone
Cybersecurity – Senior Information System Security Manager (ISSM)
@ Boeing | USA - Seal Beach, CA
Engineering -- Tech Risk -- Security Architecture -- VP -- Dallas
@ Goldman Sachs | Dallas, Texas, United States