all InfoSec news
Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
March 4, 2024, 5:10 a.m. | Nishanth Chandran, Sunayana Sitaram, Divya Gupta, Rahul Sharma, Kashish Mittal, Manohar Swaminathan
cs.CR updates on arXiv.org arxiv.org
Abstract: Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking studies conducted so far and the future of evaluation using benchmarks. To solve this problem, …
access arxiv benchmarking benchmarks cost cs.cl cs.cr evaluation leaked llms low open source private speed standard today work
More from arxiv.org / cs.CR updates on arXiv.org
Jobs in InfoSec / Cybersecurity
Information Security Engineers
@ D. E. Shaw Research | New York City
Technology Security Analyst
@ Halton Region | Oakville, Ontario, Canada
Senior Cyber Security Analyst
@ Valley Water | San Jose, CA
Associate Engineer (Security Operations Centre)
@ People Profilers | Singapore, Singapore, Singapore
DevSecOps Engineer
@ Australian Payments Plus | Sydney, New South Wales, Australia
Senior Cybersecurity Specialist
@ SmartRecruiters Inc | Poland, Poland