all InfoSec news
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models. (arXiv:2308.03825v1 [cs.CR])
cs.CR updates on arXiv.org arxiv.org
The misuse of large language models (LLMs) has garnered significant attention
from the general public and LLM vendors. In response, efforts have been made to
align LLMs with human values and intent use. However, a particular type of
adversarial prompts, known as jailbreak prompt, has emerged and continuously
evolved to bypass the safeguards and elicit harmful content from LLMs. In this
paper, we conduct the first measurement study on jailbreak prompts in the wild,
with 6,387 prompts collected from four …
adversarial attention general human human values intent jailbreak language language models large llm llms prompts public response vendors