Back in June, the National Academies of Sciences, Engineering, and Medicine (NAS) issued a long-awaited report that analyzed the Compliance, Safety, Accountability or CSA program introduced by the Federal Motor Carrier Safety Administration seven years ago. The program is supposed to be a more effective way to measure motor carriers safety.
That 130-page contained what Steven Bryan, president of software developer Vigillo, called a “laundry list” of issues concerning the statistical models within the CSA program’s safety measurement system (SMS) which are then used to craft motor carrier safety scores.
During a presentation at the 2017 American Trucking Associations (ATA) annual conference in Orlando, FL, this week, Bryan and Joe DeLorenzo, FMCSA’s director of the office of compliance and enforcement, explained that the agency is going to use a new analytical “tool” to help improve the CSA safety scoring system over the next two years – a tool called “item response theory” or IRT.
Bryan said the reason IRT is being used is that “there needs to be a sound statistical model” to replace the ad hoc nature of how CSA scores are formulated today.
“More data – and more reliable data – needs to be gathered and the data utilized needs to be more transparent [with] software developed to make such data more useful to stakeholders,” he added. “Only then should CSA scores be released to the public.”
Those scores were taken “off line” in December 2015, away from public view, in part due to flaws in how CSA safety scores were being calculated.
Gearing up with IRT
FMCSA’s DeLorenzo said his agency is going to try and correct that issue by developing an IRT model over the next two years – a model that provides an “estimate of measure of safety culture” for a motor carrier hat can then be used to monitor and identify those in need of an “intervention.”
“Part of what we’re looking to do with IRT is [find out] what are the statistics really telling us? It will answer if what we are looking at is really a safety issue,” he explained.
The essence of an IRT-based model, Vigillo’s Bryan noted, is that rather than using “classical test theory” where “cumulative scores” matter the most – how many questions in total one answered correctly – it focuses on the “individual items” or questions themselves.
Bryan shared the results on an informal online survey he conducted ahead of ATA’s annual conference design to gauge “common knowledge” among respondents – could they identify the Gettysburg address, the age of the known fossil record (3.5 billion years), etc.
Yet within this survey he added a cinema trivia question – when was the first Fast & Furious movie released? (June 22, 2001). Using IRT, he then sorted the answer sets and found that those who answered the Fast & Furious question right got almost everything else wrong.
“So it’s not just about whether you answer a question right or wrong,” Bryan explained. “It’s about whether are you answering a question correctly that in actuality has a negative value. It’s about un-wrapping the questions themselves to see where the real knowledge is.”
In a safety context, he said IRT will help “shine a light” on “cultural weaknesses” among motor carriers that ultimately lead to a higher crash risk.
FMCSA’s DeLorenzo added that this type of modeling is what will help his agency measure motor carrier safety programs as part of the “Beyond Compliance” metric that it is developing.
“It’s about how do we measure those ‘above and beyond’ non-regulatory safety practices some motor carriers use,” he explained.
Getting more and better data
To use IRT to its fullest extent, DeLorenzo noted that the NAS report is encouraging his agency not only to find ways to improve roadside inspection data quality being reported by state enforcement agencies but to expand its data pool to include other metrics such as a driver pay, driver turnover, and cargo characteristics.
He also hopes that IRT will help improve FMCSA’s data review process, known as “DataQ,” which allows motor carriers to challenge data that they consider inaccurate or incomplete.
“Less than 1% of the data out there ends up in the DataQ process,” DeLorenzo explained, which he said is “significant” since the safety data pool FMCSA works from is comprised of 3.5 million inspections and 7 million violations annually based on a population of 537,000 motor carriers.
“But it does create a fairness issue and we do need to focus on data quality as a result,” he noted. “Data quality is key in terms of monitoring highway safety as miles driven in downtown Washington D.C. are not the same as miles driven on the highway.”
IRT is also “smart enough” to adjust to a flow of “imperfect” data as well, added Vigillo’s Bryan, making adjustments not just for state-to-state comparison of roadside inspection results but also whether the motor carrier in question operates three trucks versus 1,000 trucks or operates in North Dakota versus New Jersey.
“IRT is not perfect but better,” he said. “The quest for perfect, normalized data is just not possible.”
On top of that, DeLorenzo noted that half of the inspection reports currently coming into FMCSA’s SMS program contain no violations, yet the vehicle out-of-service rate continues to hover at 20%.
“That rate hasn’t changed for decades,” he explained, which is one reason why FMCSA will be working to use both “absolute” and “relative” metrics to decide on which motor carriers receive safety interventions.