VLMs Go Beyond Text Data in Helping Supply Chains

June 15, 2026
Sue Doerfler 3.jpg
By Sue Doerfler
1 eye.jpg

What does VLM stand for?

A) Virtual language model

B) Video language model

C) Vision language model.

In today’s world of technology, a large language model (LLM), a subcategory of AI, is trained on text data. It doesn’t see anything.

“But the real world that we live in is not text only; it’s multimodal,” says Dijam Panigrahi, co-founder and COO of GridRaster Inc., a technology provider based in Mountain View, California. “If we wanted to solve something in the real environment, specifically related to industrial manufacturing, most everything is 3-D. We understand things in 3-D, we see things in space.”

Vision language models — answer C is correct — basically provide that vision, the way we will see the world, he says. They are trained on both text and visual data. “If you give the VLM an image, it can make sense of it,” he says. “When we give it a video, it sees as a stream of images.”

VLMs are on the road to revolutionize manufacturing, where 3-D spatial understanding is critical, Panigrahi says. They are already being used on factory floors, helping robots inspect complex parts, compare what they see with expert standards, and make autonomous quality decisions.

Yet despite their growing role, VLMs remain underreported and poorly understood, he says.

A More In-Depth Look

The “beauty” of the VLM model is that it can reason, Panigrahi says. “Because it’s tied to an LLM, it can give you instruction that’s not preprogrammed,” he says. “In a way, the VLM model is trained based on how you use it and whatever you’re doing, it can on the fly adapt and rationalize and provide you guidance so that you're able to do the job.”

A VLM can take real-time input and feedback and what you are trying to do or are struggling with, then devise a pathway to help you do the job better, Panigrahi says.

For supply chains, he says, one of the biggest advantages of a VLM is on with inspection of damage, a defect, a crack or other issue. “Let’s suppose I wear a headset, and I leverage the VLM model to look at some damage. The VLM is able to identify the type of crack it is and the type of work that needs to be done.” And it can instruct you how to fix it.

The ‘Real World’ Challenge

Among other uses and potential uses: supplier performance analysis, warehouse optimization, training and inspection automation. VLMs facilitate more intuitive human-robot interaction, allowing operators to guide robots using visual cues and enabling robots to respond to real-time safety and operational signals.

At this point, many if not most organizations have implemented AI. “It has worked really well at this layer with text data and desktops,” Panigrahi says.

“But making it work in the real world is where the challenge is. And every world is a little different.” Humans are all different, he says.

Manufacturing an aircraft engine is different than manufacturing a wing — they require different materials, assembly, instructions and perhaps even some unique skill sets. Having technology that can adapt can move organizations to the next level.

That’s where VLMs can make the difference, he says.

(Image credit: Getty Images/Arndt Vladimir)

About the Author

Sue Doerfler

About the Author

As Senior Writer for Inside Supply Management® magazine, I cover topics, trends and issues relating to supply chain management.