Feb. 16, 2024, 5:10 a.m. | Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.10208v1 Announce Type: cross
Abstract: The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank …

arxiv can cs.cl cs.cr cs.cv cs.lg current dataset fine-tuning generative generative models human human values large modeling paradigm practice recover safe scale training

Principal Security Engineer

@ Elsevier | Home based-Georgia

Infrastructure Compliance Engineer

@ NVIDIA | US, CA, Santa Clara

Information Systems Security Engineer (ISSE) / Cybersecurity SME

@ Green Cell Consulting | Twentynine Palms, CA, United States

Sales Security Analyst

@ Everbridge | Bengaluru

Alternance – Analyste Threat Intelligence – Cybersécurité - Île-de-France

@ Sopra Steria | Courbevoie, France

Third Party Cyber Risk Analyst

@ Chubb | Philippines