all InfoSec news
Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
March 4, 2024, 5:10 a.m. | Nishanth Chandran, Sunayana Sitaram, Divya Gupta, Rahul Sharma, Kashish Mittal, Manohar Swaminathan
cs.CR updates on arXiv.org arxiv.org
Abstract: Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking studies conducted so far and the future of evaluation using benchmarks. To solve this problem, …
access arxiv benchmarking benchmarks cost cs.cl cs.cr evaluation leaked llms low open source private speed standard today work
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Senior Security Specialist, Forsah Technical and Vocational Education and Training (Forsah TVET) (NEW)
@ IREX | Ramallah, West Bank, Palestinian National Authority
Consultant(e) Junior Cybersécurité
@ Sia Partners | Paris, France
Senior Network Security Engineer
@ NielsenIQ | Mexico City, Mexico
Senior Consultant, Payment Intelligence
@ Visa | Washington, DC, United States
Corporate Counsel, Compliance
@ Okta | San Francisco, CA; Bellevue, WA; Chicago, IL; New York City; Washington, DC; Austin, TX
Security Operations Engineer
@ Samsara | Remote - US