Animal classification could be simplified through global and local entailment learning

Work in Nathan Jacobs’ lab creates an improved hierarchical representation learning framework

Eric Butterman 10.07.2025

Fine-grained classification of images of species is a hallmark of biology and computer vision, but achieving this other than at the most granular stages of hierarchy can be difficult. A new model could change that, thanks to new research from Nathan Jacobs’ lab. (Credit: Srikumar Sastry)

Facebook Twitter Linkedin Email

Fine-grained classification of images of species is a hallmark of biology and computer vision, but achieving this other than at the most granular stages of hierarchy can be difficult. Now with Radial CrossModel Embeddings, a modeling of transitivity-enforced entailment, that could change, thanks to new research from the McKelvey School of Engineering at Washington University in St. Louis.

“There are large databases of species images captured by people that are uploaded to citizen science platforms, which can be leveraged to learn structured hierarchical models for species recognition.” said Srikumar Sastry, a doctoral student in computer science & engineering and co-author on a paper to be presented at International Conference on Computer Vision, Oct 19-23, 2025, in Honolulu, Hawaii.

Existing methods can usually classify images at the species rank, he says — biologists can classify taxonomic levels at high levels like kingdom and phylum — but then the model’s performance starts to degrade as it goes downward at the taxonomic level.

“We wanted to build a system that could solve this, because, if you can classify independently at each level, biologists might be able to help an organization such as a botanical garden or an expert to interpret these specimens in key situations,” he says.

Within the technique called entailment learning, the researchers, which included co-author Nathan Jacobs, professor of computer science & engineering in McKelvey Engineering and assistant vice provost for digital transformation, looked at representation learning.

“When you pass an image through a model, it generates a feature representation, which is an embedding that is a low dimensional vector,” Sastry said. “Images are dense, so when you pass it through a model, it compresses its dimensionality. We want these vectors for similar species to be close together and vectors for dissimilar species to be far apart and we call this an embedding space or a latent space.”

Once they have these spaces in a meaningful order, when they have an unseen image of the species, they can calculate this vector using their model.

“Then we can look at the latent space and see where this vector falls,” he says. “From there, we can compare it with existing vectors, so we can compute a distance between the vectors and then classify that given unknown image of that species.”

Finally, he reminds their work doesn’t just work for image representations but for textual representations as well.

“Each taxonomic level the text in there is also represented as a vector,” he says. “Then we learn the correspondence between the image and the textual representations.”

Sastry S, Dhakal A, Xing E, Khanal S, Jacobs N. Global and Local Entailment Learning for Natural World Imagery. International Conference on Computer Vision, Oct 19-23, 2025, in Honolulu, Hawaii. https://vishu26.github.io/RCME/index.html