Zhang and collaborators win USENIX distinguished paper award
Ning Zhang’s award-winning work examines attempts to circumvent security restrictions on generative AI tools
Ning Zhang, associate professor of computer science & engineering in the McKelvey School of Engineering at Washington University in St. Louis, and Zhiyuan Yu, a final-year doctoral student in Zhang’s lab, recently received a distinguished paper award from USENIX, a leader in computing systems research. Their paper, “Don’t Listen to Me: Understanding and Exploring Jailbreak Prompts of Large Language Models,” examines jailbreak prompts as one of the most effective methods to circumvent security restrictions on generative AI tools. The work was presented at the USENIX Security 2024 conference.
Recent advancements in generative AI have enabled ubiquitous access to large language models (LLMs), opening countless avenues for potential misuse of this powerful technology and, in turn, prompting defensive measures from service providers. Users who want to get around these security restrictions turn to jailbreak prompts that can bypass boundaries programmed into the AI. Jailbreak prompts then allow nefarious users to elicit harmful content that would otherwise be prohibited.
In their award-winning work, Zhang and his team aim to gain a better understanding of the threat landscape of jailbreak prompts.
“The frontline of jailbreak prompts is largely seen in online forums and among hobbyists, so even users without expertise in LLMs can succeed in generating effective jailbreak prompts,” Zhang said. “We analyzed existing prompts and measured their jailbreak effectiveness empirically, conducted a user study to understand the process of manually creating jailbreak prompts, and developed a system using AI to automate jailbreak prompt generation. This work is all about understanding the ways jailbreak prompts are created so we can develop better security measures to counter them.”