Recovering the Pre-Fine-Tuning Weights of Generative Models | allinfosecnews.com

Feb. 16, 2024, 5:10 a.m. | Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

cs.CR updates on arXiv.org arxiv.org

arXiv:2402.10208v1 Announce Type: cross
Abstract: The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank …

arxiv can cs.cl cs.cr cs.cv cs.lg current dataset fine-tuning generative generative models human human values large modeling paradigm practice recover safe scale training

More from arxiv.org / cs.CR updates on arXiv.org

Differentially private Bayesian tests 1 day, 4 hours ago | arxiv.org

arxiv confidential cornerstone cs.cr +16

On the Learnability of Watermarks for Language Models 1 day, 4 hours ago | arxiv.org

arxiv ask can cs.cl +12

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image … 1 day, 4 hours ago | arxiv.org

applications arxiv attack cs.cr +14

On the Reliability of Watermarks for Large Language Models 1 day, 4 hours ago | arxiv.org

arxiv bots cs.cl cs.cr +23

A Watermark for Large Language Models 1 day, 4 hours ago | arxiv.org

arxiv can cs.cl cs.cr +13

Asymmetric Distributed Trust 1 day, 4 hours ago | arxiv.org

abstraction algorithms arxiv can +12

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips 1 day, 4 hours ago | arxiv.org

arxiv bandwidth chips cs.ar +5

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation 1 day, 4 hours ago | arxiv.org

access address area arxiv +17

A Case Study of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis 1 day, 4 hours ago | arxiv.org

analysis arxiv can capabilities +17

Principal Security Engineer

@ Elsevier | Home based-Georgia

View on infosec-jobs.com

Infrastructure Compliance Engineer

@ NVIDIA | US, CA, Santa Clara

View on infosec-jobs.com

Information Systems Security Engineer (ISSE) / Cybersecurity SME

@ Green Cell Consulting | Twentynine Palms, CA, United States

View on infosec-jobs.com

Sales Security Analyst

@ Everbridge | Bengaluru

View on infosec-jobs.com

Alternance – Analyste Threat Intelligence – Cybersécurité - Île-de-France

@ Sopra Steria | Courbevoie, France

View on infosec-jobs.com

Third Party Cyber Risk Analyst

@ Chubb | Philippines

View on infosec-jobs.com