all InfoSec news
Harnessing large-language models to generate private synthetic text. (arXiv:2306.01684v1 [cs.LG])
cs.CR updates on arXiv.org arxiv.org
Differentially private (DP) training methods like DP-SGD can protect
sensitive training data by ensuring that ML models will not reveal private
information. An alternative approach, which this paper studies, is to use a
sensitive dataset to generate a new synthetic dataset which is differentially
private with respect to the original data. Doing so has several advantages:
synthetic data can be reused for other tasks (including for hyper parameter
tuning), retained indefinitely, or shared with third parties without
sacrificing privacy.
However, …
data information language language models large ml models private protect respect studies synthetic text training