all InfoSec news
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!. (arXiv:2310.03693v1 [cs.CL])
cs.CR updates on arXiv.org arxiv.org
Optimizing large language models (LLMs) for downstream use cases often
involves the customization of pre-trained LLMs through further fine-tuning.
Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5
Turbo on custom datasets also encourage this practice. But, what are the safety
costs associated with such custom fine-tuning? We note that while existing
safety alignment infrastructures can restrict harmful behaviors of LLMs at
inference time, they do not cover safety risks when fine-tuning privileges are
extended to end-users. …
apis cases customization datasets gpt gpt-3 language language models large llama llms meta openai practice release safety use cases