Better visual quality or better depth?

Research in Nathan Jacobs’ lab looks at stereo image generation technology

Eric Butterman 
New research in Nathan Jacobs’ lab created a model that generates better looking stereo image pairs that more accurately reflect the geometry of the scene. (Reference image first, GenStereo image second). Credit: Feng Qiao and Audrey Westcott
New research in Nathan Jacobs’ lab created a model that generates better looking stereo image pairs that more accurately reflect the geometry of the scene. (Reference image first, GenStereo image second). Credit: Feng Qiao and Audrey Westcott

Stereo depth perception, the ability to estimate how far away an object is based on subtle differences in how it looks from our left and right eyes, is one of the strongest cues people use to understand the world around them. It tells us how far away objects are, which is difficult to estimate from a single image of the scene. Stereo perception also used extensively by autonomous driving systems, where two or more cameras are used to avoid obstacles, says Feng Qiao, a doctoral student at the McKelvey School of Engineering at Washington University in St. Louis.

“The stereo images from a pair of cameras give you important three-dimensional information; many robots are equipped with stereo image systems,” said Qiao, first author of a paper to be presented at the International Conference on Computer Vision, Oct. 19-23, 2025, in Honolulu, Hawaii.

Modern stereo vision systems used in robots rely heavily on machine learning to perform the essential of correspondence estimation, where the system determines which pixels in the left and right image are looking at the same point. Training such systems requires massive datasets captured across a wide range of conditions, this is expensive and time consuming. This motivates the question, “Is it possible to synthesize a stereo pair from a single image?” If so, such pairs could be generated across a wide range of conditions and then used to train stereo vision systems. Recent work has explored this approach, but they have suffered from either low visual quality or poor-quality depth estimation.

This is why Qiao and his co-authors on the paper, including Nathan Jacobs, professor of computer science & engineering and assistant vice provost for digital transformation, focus on both with their GenStereo technology, he added. 

GenStereo is innovating through “first, conditioning the diffusion process on a disparity-aware coordinate embedding and a warped input image, allowing for more precise stereo alignment than previous methods, and second, an adaptive fusion mechanism that intelligently combines the diffusion generated image with a warped image, improving both realism and disparity consistency.” In short, it generates better looking stereo image pairs that more accurately reflect the geometry of the scene. The team further demonstrates that the quality is so good that including these synthetic images in the training dataset results in a more accurate stereo depth perception system.

Qiao said in the example of autonomous driving, better stereo depth perception will reduce accidents, including collisions with pedestrians, concrete medians other vehicles. 

“With driving, the left image and right image from a pair of cameras are used to estimate disparity, closely related to depth, using our improved stereo matching system,” he said. “With this disparity, it can be used for downstream tasks, like obstacle detection.”

GenStereo creates its corresponding image by enforcing constraints through input, feature and output. The result? A win in geometric consistency and visual imagery.

“While this work focused on stereo perception, we see applications in virtual and augmented reality, where our synthetic stereo images can be used to add three-dimensional realism to images captured from a standard, non-stereo camera,” Qiao said. “While the results are promising, as we mention in our paper, the next step could be focusing on models that look at larger disparities. It will be exciting to see what our work can do for technology that is already affecting lives.” 


Qiao F, Xiong Z, Xing E, Jacobs N. Towards Open-World Generation of Stereo Images and Unsupervised Matching. International Conference on Computer Vision, Oct. 19-23, 2025, in Honolulu, Hawaii. https://qjizhi.github.io/genstereo/

Click on the topics below for more stories in those areas

Back to News