Zhang and collaborators win USENIX distinguished paper award

Ning Zhang’s award-winning work examines attempts to circumvent security restrictions on generative AI tools

Shawn Ballard 
Generative AI tools, including large language models like ChatGPT, have security measures to prevent the creation of harmful content, but even novice users can use jailbreak prompts to escape these guard rails. (Photo: iStock)
Generative AI tools, including large language models like ChatGPT, have security measures to prevent the creation of harmful content, but even novice users can use jailbreak prompts to escape these guard rails. (Photo: iStock)

Ning Zhang, associate professor of computer science & engineering in the McKelvey School of Engineering at Washington University in St. Louis, and Zhiyuan Yu, a final-year doctoral student in Zhang’s lab, recently received a distinguished paper award from USENIX, a leader in computing systems research. Their paper, “Don’t Listen to Me: Understanding and Exploring Jailbreak Prompts of Large Language Models,” examines jailbreak prompts as one of the most effective methods to circumvent security restrictions on generative AI tools. The work was presented at the USENIX Security 2024 conference.

Recent advancements in generative AI have enabled ubiquitous access to large language models (LLMs), opening countless avenues for potential misuse of this powerful technology and, in turn, prompting defensive measures from service providers. Users who want to get around these security restrictions turn to jailbreak prompts that can bypass boundaries programmed into the AI. Jailbreak prompts then allow nefarious users to elicit harmful content that would otherwise be prohibited. 

In their award-winning work, Zhang and his team aim to gain a better understanding of the threat landscape of jailbreak prompts.

“The frontline of jailbreak prompts is largely seen in online forums and among hobbyists, so even users without expertise in LLMs can succeed in generating effective jailbreak prompts,” Zhang said. “We analyzed existing prompts and measured their jailbreak effectiveness empirically, conducted a user study to understand the process of manually creating jailbreak prompts, and developed a system using AI to automate jailbreak prompt generation. This work is all about understanding the ways jailbreak prompts are created so we can develop better security measures to counter them.”

Click on the topics below for more stories in those areas

Back to News