May 5, 2022, 1:20 a.m. | Xuandong Zhao, Lei Li, Yu-Xiang Wang

cs.CR updates on arXiv.org arxiv.org

Large language models are shown to memorize privacy information such as
social security numbers in training data. Given the sheer scale of the training
corpus, it is challenging to screen and filter these privacy data, either
manually or automatically. In this paper, we propose Confidentially Redacted
Training (CRT), a method to train language generation models while protecting
the confidential segments. We borrow ideas from differential privacy (which
solves a related but distinct problem) and show that our method is able …

confidential language

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Cloud Technical Solutions Engineer, Security

@ Google | Mexico City, CDMX, Mexico

Assoc Eng Equipment Engineering

@ GlobalFoundries | SGP - Woodlands

Staff Security Engineer, Cloud Infrastructure

@ Flexport | Bellevue, WA; San Francisco, CA

Software Engineer III, Google Cloud Security and Privacy

@ Google | Sunnyvale, CA, USA

Software Engineering Manager II, Infrastructure, Google Cloud Security and Privacy

@ Google | San Francisco, CA, USA; Sunnyvale, CA, USA