Co-learning to improve autonomous driving
Nathan Jacobs leads team that developed a joint learning framework to enhance two closely related computer vision tasks critical in autonomous driving applications
Self-driving cars are both fascinating and fear-inducing as they must accurately assess and navigate the rapidly changing environment. Computer vision, which uses computation to extract information from imagery, is an important aspect of autonomous driving with tasks ranging from low level, such as determining how far away a given location is from the vehicle, to higher level, such as determining if there is a pedestrian in the road.
Nathan Jacobs, professor of computer science & engineering in the McKelvey School of Engineering at Washington University in St. Louis, and a team of graduate students developed a joint learning framework to optimize two low-level tasks: stereo matching and optical flow. Stereo matching generates maps of disparities between two images and is a critical step in depth estimation for avoiding obstacles. Optical flow aims to estimate per-pixel motion between video frames and is useful to estimate how objects are moving as well as how the camera is moving relative to them.
Ultimately, stereo matching and optical flow both aim to understand the pixel-wise displacement of images and use that information to capture a scene’s depth and motion. Jacobs’ team’s co-training approach simultaneously addresses both tasks, leveraging their inherent similarities. The framework, which Jacobs presented Nov. 23 at the British Machine Vision Conference in Aberdeen, UK, outperforms comparable methods for completing stereo matching and optical flow estimation tasks in isolation.
One of the big challenges in training models for these tasks is acquiring high-quality training data, which can be both difficult and costly, Jacobs said. The team’s method capitalizes on effective methods for image-to-image translation between computer-generated synthetic images and real image domains. This approach allows their model to excel in real-world scenarios while training solely on ground-truth information from synthetic images.
“Our approach overcomes one of the important challenges in optical flow and stereo, obtaining accurate ground truth,” Jacobs said. “Since we can obtain a lot of simulated training data, we get more accurate models than training only on the available labeled real-image datasets. More accurate stereo and optical flow estimates reduce errors that would otherwise propagate through the rest of the autonomous driving pipeline system, such as obstacle avoidance.”
Xiong Z, Qiao F, Zhang Y, and Jacobs N. StereoFlowGAN: Co-training for stereo and flow with unsupervised domain adaptation. British Machine Vision Conference (BMVC), Nov. 20-24, 2023. DOI: https://doi.org/10.48550/arXiv.2309.01842