The old computer adage garbage in, garbage out (GIGO) is becoming true in the industry as trucking begins to rely heavily on "machine learning algorithms for everything from managing fuel spend to lifecycle costs. Fleets that don't understand GIGO will not see the efficiencies promised from technologies, and that is because the data inputs are not always "clean", says Kurt Thearling, vice president of analytics for WEX Inc.
Fleet Owner recently had a chance to ask Thearling to extrapolate on that and other “machine learning” questions related to motor carrier operations.
Many trucking fleets assume data is ‘clean’ and usable right from the get go. Why isn’t it? How can ‘machine learning’ set about cleaning data up automatically?
One of the many reasons why data may not be clean is because of coding problems within the initial source systems, as the data could be corrupted while it is in motion from the source to the customer. Additionally, manual processing of the data along the way could make the data unusable, which is why it’s important to monitor any human interactions with the data as this can introduce errors, as well. The key to avoid having unusable data is to utilize processes and models that identify ‘dirty’ data, and then correct the data wherever and whenever possible.
For example, consider a situation where a fleet manager sees a report showing that a number of his or her drivers are starting to buy premium gasoline, even though company policy calls for purchasing regular fuel. The fleet manager will likely address the discrepancy with those drivers; however, the problem might be with the actual data. Product miscodes are common, and maybe the station nearest their location changed their systems recently and inadvertently began miscoding regular for premium gasoline. Clearly, making a decision based on this faulty data and information could be a problem. A more thorough look at the data could pick out the issue with the data and flag it to the fleet managers.
Our technologies and solutions can look across multiple fleets and review purchases near a particular location while examining the cost differences between regular and premium fuel. This allows machine learning models to be built for fuel price and grade. From there, the machine learning model examines the price for every transaction in the area and makes a prediction of regular versus premium pricing based on the local data. If the model predicts that a premium transaction is actually regular, then the transaction would be recoded. In the end, only true premium purchases would be flagged, thus saving time for the fleet manager and creating less disruption among the drivers.
Can you describe the connection between ‘cleaner’ data and fraud detection? How might that use of data help trucking in other areas?
Ultimately, any insight coming from data will depend on the quality of the data; fraud is such a critical accusation that using shaky evidence to identify it can be problematic. If you are using GPS location data to show that a vehicle was nowhere near a pump at the time of a fuel purchase, it’s important to be 100% certain that there aren’t errors with the data.
For example, what if the pump location is off and the address that shows up is actually for the vehicle’s corporate headquarters, or maybe the time stamp from the transaction is a few minutes off, which means that you aren’t looking at the location of the truck at the true time of the transaction. Sometimes the data can be corrected, but if not, then any statistical discrepancies must be taken into account when deciding whether or not to flag an incident.
To extend this fraud detection beyond fuel, managers can look at adding other data into the mix. Think about all of the location data that is being collected through phones, telematics, and the transactions themselves. Google and mapping/routing companies use this information to understand traffic patterns, plan routing, and maximize vehicle and driver utilization.
Why is it now necessary for both large and small fleets to get used to ‘data’ being at the heart of everything in trucking?
There are several reasons why data is becoming a critical element to both large and small fleet companies. First, as the volume of data available for fleet managers is growing rapidly, it is easier to collect and store this information. This allows organizations to obtain data that would have been difficult or impossible to interpret only a few years ago.
However, simply having the data is not enough, which is where the new generation of data science tools and technologies come in. Open source tools like R and Python allow users to cost effectively generate complex analyses. Additionally, cloud platforms such as AWS make it easy to run these tools on large volumes of data.
More importantly, the machine learning algorithms that are being created to solve important business problems for fleets use data to answer complex problems ranging from fraud to fuel spend benchmarking and optimization. In the past, some large fleets have been able to develop these kinds of analyses given their size and scale, however small fleets have had more limited options.
How does data guide other fleet needs such as vehicle lifecycle analysis, and why is this an important part across all aspects of trucking?
More data provides a broader view of a fleet, and allows fleet managers insight into the lifecycle of their vehicles. Understanding how a fleet vehicle is used over its lifecycle along with its associated costs is in many ways the holy grail of fleet analytics. Although we aren’t there yet, analytics are certainly moving in that direction.
Over time, algorithms will be developed to fill important gaps in analyzing and understanding important parts of a fleet’s lifecycle. When all of these elements are brought together, it’ll allow managers to see a broader view of their fleet operations beyond fleet sales, which are just a small part of these data-driven insights.