Aug. 9, 2023, 1:10 a.m. | Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang

cs.CR updates on arXiv.org arxiv.org

The misuse of large language models (LLMs) has garnered significant attention
from the general public and LLM vendors. In response, efforts have been made to
align LLMs with human values and intent use. However, a particular type of
adversarial prompts, known as jailbreak prompt, has emerged and continuously
evolved to bypass the safeguards and elicit harmful content from LLMs. In this
paper, we conduct the first measurement study on jailbreak prompts in the wild,
with 6,387 prompts collected from four …

adversarial attention general human human values intent jailbreak language language models large llm llms prompts public response vendors

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Network Security Engineer

@ Meta | Menlo Park, CA | Remote, US

Security Engineer, Investigations - i3

@ Meta | Washington, DC

Threat Investigator- Security Analyst

@ Meta | Menlo Park, CA | Seattle, WA | Washington, DC

Security Operations Engineer II

@ Microsoft | Redmond, Washington, United States

Engineering -- Tech Risk -- Global Cyber Defense & Intelligence -- Bug Bounty -- Associate -- Dallas

@ Goldman Sachs | Dallas, Texas, United States