Blog post

10 Questions to Validate Logistics-AI

Sep 6, 2024

.

5 minutes to read

Introduction

As the supply chain and logistics industry evolves, it’s increasingly clear that AI is already playing a major role. But with so many solutions claiming to use AI, how can you tell which ones are genuinely built for logistics and which are just generic models with shiny wrappers?

Generic AI models can be a foundation model provider (like Microsoft Florence) or one that’s using a foundation’s model AI via an API (like Chat GPT-4). Both types solve different general problems, but will not be particularly helpful when specialized knowledge is required, like with supply chain data. 

Trying to figure out if an AI solution will suit your needs? In this guide, we’ll walk through the key questions that can help you distinguish between true logistics-AI platforms and generic AI.

Function – accuracy, explainability, and transparency

To understand whether an AI model truly functions as promised, the following questions will help you assess the accuracy, explainability, and transparency of the AI’s decision-making process. 

Loop logistics expert Kevin Tee put it simply, “Transparency in AI decision making is key. It helps users trust the technology.” If the technology company doesn’t have full control over AI decision making, then accuracy and quality are at risk. 

1. How does your AI arrive at its conclusions?

Solution providers should be able to clearly walk you through how their AI arrives at conclusions. A company that is using a generic model to power its AI will not have insight into the AI’s rationalizations. However, specialized models, like Loop’s, are different.  

There are two layers to how Loop’s AI comes to its conclusions. 

1. Consensus: We use a consensus-based mechanism, meaning we need multiple models to come to the same answer for every output that the AI is delivering. 

Multiple models become increasingly important as the AI solution becomes more complex. An incorrect output can throw off several downstream tasks, so having multiple models ensures accuracy across the board. 

2. Human in the Loop: We use a human in the loop to further validate and ensure precision when our models are uncertain. 

For example, when extracting the total invoiced amount from a table on an invoice, we have very specific protocols coded to ensure that the extracted data points add up to the total contextual sum. In the case that it doesn’t, there is a human in the loop to right the output. Furthermore, that answer is fed back into the model so it learns from its mistakes. 

Since our models are trained on our customer’s data, having the human in the loop is critical in our onboarding process, especially early on. During implementation, our supply chain experts are hands-on in  training the AI to help establish a solid baseline. But as our models learn,  human oversight becomes less necessary. However, not everyone approaches their AI models this way.

A company using a generic foundation model via API has limited control over how that AI makes decisions. This means they also cannot guarantee accuracy on supply chain specific questions. 

2. Can you explain the reasoning process behind a specific answer (output)?

Understanding the logic behind AI’s conclusions is crucial for building trust and ensuring accuracy. At Loop, our AI uses a very methodical process to come to an answer. We’ve created our own proprietary atomic task system that allows us to break down and describe exactly how the AI arrived at a specific conclusion. 

For example, if we ask it to find the pickup date on an invoice, here’s how it would explain its reasoning. First it lists all the dates off the page. Then, for each date, states the corresponding value like the associated label. Finally, it classifies the value as one of four types of dates: pickup, drop off, invoice, or other. 

This might sound superfluous, but this process grounds the output within the context of the input, ensuring that the AI didn’t just make up the date, and if it picked the wrong one, we teach it to find the right one, to continue to improve its accuracy.

Because supply chain data is so nuanced, this level of detail is necessary for success.  A generic AI model will not be able to explain its reasoning in the same way or pinpoint how it reached a conclusion. You will need to decide if this level of ambiguity is sufficient for the problem you want AI to solve. 

3. Can you identify and address potential biases in your training data?

AI is only as good as the data it’s trained on, and in the supply chain, sometimes you have more of one type of data than another. Maybe a shipper has more UPS invoices than regional carrier invoices, but both are equally important in understanding accessorials and total costs. Following the same logic, the AI model needs to understand how to handle large and small data sets equally. It can’t over index on the majority.

At Loop, we’ve structured our algorithms to continually learn where imbalances are normal and where they might present an issue. When we see a model is not performing on a class of data—like addresses or origin addresses—we proactively deep dive to figure out why and make adjustments for continuous improvement.

On the other hand, generic AI models typically don’t manage biases, focusing instead on the majority data set. Managing biases and creating a well-rounded AI model requires a more concrete feedback loop, something generic models usually lack.

4. How do you define accuracy?

Every provider defines accuracy differently. The level of accuracy you need from a vendor depends on the problem you need the AI to solve. For example, if you are using AI to help your account reps find vendor standard operating procedure (SOPs) documents in a content management system, then 80% accuracy might be sufficient. But, if you’re relying on AI to make billing decisions, you can’t accept anything short of (almost) always perfect.

At Loop, we define accuracy as 100%, but nothing is always perfect so we “officially” say 99.9%. To ensure (near) perfect precision, our consensus model ensures that the majority of models come to the same conclusion. If not, a supply chain expert steps in to correct and train the models accordingly. 

Secondly, we have what we call a “post-consensus accuracy” check. After the consensus is reached, we run an entirely separate model dedicated solely to double-checking the result. If that is validated, the output is presented to the user. If there’s an issue, we identify where, why, and then address it. This allows us to come as close to 100% accuracy as possible.  

Generic AI typically measures accuracy by comparing the model’s output to a set of correct answers or reference outputs. It’s up to the user to validate the results, not the provider. Simply, they can't guarantee 100%, or even 20%, accuracy.

5. How do you identify and defend against hallucinations?

Preventing hallucinations is crucial for maintaining credibility and trust. A hallucination happens when AI generates false or misleading information. Sometimes, it’s obvious that the answer is wrong, but more often than not, it passes the sniff test. 

At Loop, we defend against hallucinations with our consensus model. Even if one model hallucinates, we have others as a failsafe, and a human in the loop to course correct. On top of that, we have specialized checks for different AI functions. For instance, when it comes to our AI-driven extraction, we use a combination of Optical Character Recognition (OCR) and Large Language Models (LLMs) to ensure that the data points can be cited back to the page itself. 

OCR maps the page to find where text lives on the page. Then, the LLM and computer vision extract the text based on that mapping to understand the text’s context and translate it into usable data. This combination is critical for handling messy supply chain documents because OCR alone won’t pick up on things like handwriting, check marks, if something is crossed out and has writing next to it, etc. 

After this extraction process, our models will analyze the historical patterns of your data to determine if a human needs to step in. This lets us trace every AI response back to specific information from its source so we can improve accuracy, confidence, and credibility. 

A generic AI solution likely does not have multiple models or layers of defense against hallucinations. More often than not, it’s one model, so if it hallucinates, that’s the answer that the user gets—no checks and balances in place. 

Improvement - learning, adaptability, and innovation

Now that we understand how the AI functions, it’s important to consider its potential for improvement. The questions below will help you determine if the AI you’re considering is ready to adapt and grow alongside you.

1. How does your AI learn and improve over time? 

Improvement can mean a lot of things, so it’s key to determine what improvements would impact your business the most. Do you want your model to become faster continually? Do you want it to be able to handle more use cases within a specific area? 

With generic AI models, they often try to improve many different use cases, but you can’t be sure they’re improving your specific use case. You wait for an upgrade every six months, and while it might improve in some ways, others could worsen. Essentially, you’re at the whim of whatever the service provider decides to prioritize. Plus, because that provider does not own the training data or set the training standard (like correcting wrong answers to help model learn), improvement is slow or next to none.

At Loop, we’ve built in checks and balances at every stage—from input to output—so anytime something gets corrected, the model retrains itself.. We’ve also designed Loop’s AI to quickly adapt to new supply chain information and use cases efficiently. This means that as it processes more information and receives more feedback, it becomes increasingly accurate and reliable. 

If you want Loop to be able to one day write you a sonnet about your favorite sports team, we’re not the right model for you. But if you want us to continually better manage your growing supply chain data sets, you can bet on us.

2. What data are you using to train your models?

The quality and diversity of training data are vital. As we’ve mentioned before, you cannot control the data a generic model is trained on.

When you partner with Loop, we train our models on your data, including shipment records, customer feedback, and real-time tracking,  regardless of source and format. This creates a robust and accurate data set that’s used to automate workflows and uncover insights. Reach out to understand more about our data training processes.

3. How long does it take for your models to learn new concepts?

The supply chain is dynamic, so it’s vital that the AI models adapt just as quickly. At Loop, we've observed the time it takes for our AI to learn new tasks has dropped significantly as we’ve collected more data and built a mature infrastructure that supports faster training and continuous improvement. 

For example, while working with Loadsmart, we managed to bring down the time it took to audit invoices from days to just hours—and sometimes even minutes. If you want to learn more about our contextual learning, check out our guidebook: Making sense of AI in a rapidly evolving supply chain.

For generic models, the rate of improvement is often dictated by open source benchmarks across a wide range of tasks, and these tasks rarely focus on the problems in the supply chain.

4. How much prompt engineering do you use?

Prompt Engineering is the art and science of crafting effective prompts to elicit desired responses from LLMs. It’s all about balancing manual intervention with automated learning. 

Most AI solutions that leverage generic foundation models rely heavily on prompt engineering. Essentially, they try to set up the correct sequence of questions to prompt the right answer. However, that typically means that the models struggle to adapt well in the wild. 

A dedicated, experienced team of machine learning experts can reduce or eliminate the need for extensive prompt engineering. Loop's logistics AI is built by top-tier engineers so that our models can get answers autonomously, ensuring better scalability and performance.

5. Who on your team is actively improving your AI? 

The effectiveness of AI depends heavily on the people who are building it. 

At Loop, our AI is constantly evolving, thanks to our team of experts.  Between our four powerhouse AI engineers, and our Success and Operations team who have decades of expertise in the industry, our AI is highly specialized and always improving. Our team would love to share with you more on how we’re innovating. 

Digging deeper

These fundamental questions can help demystify what AI you’re actually getting when you partner with a provider. It’s critical to dive into specifics to ensure your solution is right for your needs.Too often, the “AI” wrapper is sold, but underneath the solution is just BPOs. Armed with these questions,  you’ll be able to sniff out the snake oil almost immediately.

If you want to know more about how our logistics-specific AI can transform your operations, don't hesitate to reach out. Contact us for a detailed discussion and get answers to all your questions.

Table of contentS
Share article: