Dec. 15, 2023, 2:24 a.m. | Tony T. Wang, Miles Wang, Kaivu Hariharan, Nir Shavit

cs.CR updates on arXiv.org arxiv.org

LLMs often face competing pressures (for example helpfulness vs.
harmlessness). To understand how models resolve such conflicts, we study
Llama-2-chat models on the forbidden fact task. Specifically, we instruct
Llama-2 to truthfully complete a factual recall statement while forbidding it
from saying the correct answer. This often makes the model give incorrect
answers. We decompose Llama-2 into 1000+ components, and rank each one with
respect to how useful it is for forbidding the correct answer. We find that in
aggregate, …

chat fact facts forbidden investigation llama llms objectives recall statement study task understand

Social Engineer For Reverse Engineering Exploit Study

@ Independent study | Remote

Data Privacy Manager m/f/d)

@ Coloplast | Hamburg, HH, DE

Cybersecurity Sr. Manager

@ Eastman | Kingsport, TN, US, 37660

KDN IAM Associate Consultant

@ KPMG India | Hyderabad, Telangana, India

Learning Experience Designer in Cybersecurity (f/m/div.) (Salary: ~113.000 EUR p.a.*)

@ Bosch Group | Stuttgart, Germany

Senior Security Engineer - SIEM

@ Samsara | Remote - US