Predicting Highway Traffic with Deep Learning and Real-World Data

Author: Denis Avetisyan


A new framework combines the power of deep learning with established transportation theory to forecast traffic volumes on national road networks.

The study harmonizes demographic and socioeconomic data with road network characteristics and real-time traffic sensor readings to create a localized origin-destination (OD) demand prediction model, employing deep learning to estimate traffic volume on specific road segments and validating performance through rigorous cross-validation- ultimately revealing underlying travel patterns, key influencing factors, and spatial distributions of demand.
The study harmonizes demographic and socioeconomic data with road network characteristics and real-time traffic sensor readings to create a localized origin-destination (OD) demand prediction model, employing deep learning to estimate traffic volume on specific road segments and validating performance through rigorous cross-validation- ultimately revealing underlying travel patterns, key influencing factors, and spatial distributions of demand.

DeepDemand accurately models long-term traffic demand by integrating socioeconomic factors and network structure, offering a scalable and interpretable alternative to traditional methods.

Accurate long-term traffic forecasting remains a challenge, often requiring a trade-off between predictive power, interpretability, and scalability. This is addressed in ‘Interpretable long-term traffic modelling on national road networks using theory-informed deep learning’, which introduces DeepDemand, a novel framework that integrates travel demand theory with deep learning to predict highway traffic volumes. By combining socioeconomic data, road network structure, and a differentiable architecture, DeepDemand achieves strong performance with improved geographic transferability and provides insights into key drivers of traffic demand. Could this approach offer a pathway towards more transparent and effective transport planning in complex road networks?


Predicting the Inevitable: The Illusion of Traffic Forecasting

The ability to accurately predict traffic volume stands as a cornerstone of modern transportation planning and a vital component in mitigating urban congestion. Effective forecasting enables proactive infrastructure development, optimized traffic signal timing, and the implementation of intelligent transport systems designed to alleviate bottlenecks before they occur. Beyond simply reducing commute times, precise traffic volume prediction directly impacts economic productivity by minimizing delays in the movement of goods and services, and contributes to environmental sustainability through the reduction of idling vehicles and associated emissions. Consequently, substantial resources are continually invested in refining forecasting methodologies, recognizing that even incremental improvements in predictive accuracy can yield significant societal benefits, from enhanced safety to improved quality of life for commuters.

Established traffic forecasting techniques, such as the Four-Step Travel Demand Model, frequently encounter limitations when applied to the nuanced realities of modern traffic. These models, developed decades ago, operate on principles of aggregate prediction, estimating traffic flow based on broad demographic and land-use data. However, real-world traffic is shaped by a multitude of interacting factors – from individual driver choices and unexpected incidents to weather conditions and even social media updates – that are difficult to incorporate into such simplified frameworks. Consequently, predictions generated by these traditional methods often diverge from actual traffic patterns, particularly in urban environments experiencing rapid growth or undergoing significant infrastructural changes. This disconnect underscores the need for more sophisticated approaches capable of capturing the dynamic and often unpredictable nature of traffic flow.

Conventional traffic forecasting models frequently operate on assumptions about how people travel that don’t fully reflect reality. These models often presume predictable routines – that individuals will consistently choose the same routes and modes of transport at similar times – failing to account for spontaneous decisions influenced by real-time events like accidents, weather changes, or even social media updates. Consequently, when faced with dynamic conditions – unexpected disruptions or fluctuations in demand – these systems can produce inaccurate predictions. The inherent rigidity of these approaches limits their ability to capture the nuanced, often irrational, behaviors that characterize actual traffic flow, creating a significant challenge for effective transport planning and congestion mitigation.

The escalating complexity of modern traffic networks demands a shift towards more detailed and empirically-supported forecasting techniques. Traditional models, often operating with aggregated data and generalized assumptions about commuter choices, frequently fail to capture the nuanced realities of traffic flow – the impact of unexpected events, the variability of individual routes, and the subtle interplay between different transportation modes. Consequently, researchers are increasingly focused on leveraging high-resolution data streams – from GPS-enabled devices and road sensors to social media activity – coupled with advanced machine learning algorithms. These data-driven approaches aim to model traffic not as a static, predictable system, but as a dynamic, evolving network where patterns emerge from the interactions of countless individual decisions, ultimately enabling more responsive and effective transportation planning.

Analysis of ground-truth traffic volumes (AADT) reveals high concentrations along major corridors, particularly around London, with distributions varying by highway type and increasing variability correlated with higher traffic <span class="katex-eq" data-katex-display="false">AADT</span> levels.
Analysis of ground-truth traffic volumes (AADT) reveals high concentrations along major corridors, particularly around London, with distributions varying by highway type and increasing variability correlated with higher traffic AADT levels.

DeepDemand: Chasing the Mirage with More Data

DeepDemand employs machine learning algorithms to forecast traffic volume by incorporating a comprehensive set of contextual data beyond traditional time-series analysis. This approach moves beyond relying solely on historical traffic patterns and instead utilizes external variables to improve prediction accuracy. The system is designed to learn complex, non-linear relationships between these contextual factors and traffic flow, allowing for more nuanced and responsive predictions than conventional methods. This integration of machine learning and detailed data distinguishes DeepDemand as a novel solution for traffic volume forecasting.

The DeepDemand model incorporates socioeconomic factors – including population density, employment rates, and points of interest – alongside detailed road network structure as primary inputs for traffic volume prediction. This approach moves beyond traditional models which often rely solely on historical traffic data or limited geographic information. Specifically, the model utilizes data representing road types, connectivity, and capacity, combined with localized socioeconomic indicators to capture nuanced relationships influencing traffic patterns. This integration allows DeepDemand to account for the impact of demographic changes, economic activity, and land use on traffic demand, improving forecast accuracy compared to methods based on simpler, less contextualized assumptions.

DeepDemand utilizes OpenStreetMap (OSM) data to construct a digital representation of the road network, incorporating details such as road geometry, connectivity, and routing restrictions. This OSM-derived network serves as the spatial foundation for traffic predictions. Ground-truth traffic data is sourced from the National Highways Traffic Information System (NHTIS), providing real-time and historical traffic volumes, speeds, and incidents. The NHTIS data is used both to train the DeepDemand model and to validate its predictive accuracy, ensuring alignment with observed traffic patterns on the UK strategic road network. The combination of detailed network topology from OSM and empirical traffic data from NHTIS is central to DeepDemand’s predictive capabilities.

DeepDemand’s predictive capability stems from its ability to model non-linear interactions between socioeconomic factors, road network characteristics, and historical traffic data. This approach allows for forecasts at a significantly higher resolution than traditional methods, moving beyond simple time-series analysis. Validation on the UK strategic road network demonstrates a coefficient of determination (R²) of 0.718, indicating that approximately 71.8% of the variance in observed traffic volume is explained by the model’s inputs and learned relationships. This performance metric suggests a substantial improvement in predictive accuracy compared to baseline models and supports the efficacy of integrating diverse datasets with machine learning techniques for traffic forecasting.

The UK's national driving network is constructed from OpenStreetMap data by filtering for drivable segments, converting to a directed graph with geometric and semantic attributes, simplifying for topological consistency, and preparing it for spatial analysis and modeling.
The UK’s national driving network is constructed from OpenStreetMap data by filtering for drivable segments, converting to a directed graph with geometric and semantic attributes, simplifying for topological consistency, and preparing it for spatial analysis and modeling.

Decoding Movement: Origin-Destination Analysis – A Necessary Illusion

DeepDemand’s core methodology centers on Origin-Destination (OD) pair analysis, a technique that examines travel patterns by identifying the points of departure and arrival for trips. This analysis moves beyond aggregated traffic counts to focus on specific locations and the movement between them. By quantifying the number of trips originating from one zone and destined for another, DeepDemand builds a detailed picture of travel demand. This localized approach allows for the identification of frequently traveled routes and enables the prediction of traffic flow at a granular level, considering the unique characteristics of each OD pair. The system then utilizes this data to forecast future traffic conditions, accounting for variations in time and external factors.

Dijkstra’s Algorithm is utilized within the DeepDemand model to computationally determine the shortest path between all origin-destination (OD) pairs within the network. This process isn’t simply about distance; the algorithm accounts for travel time based on link speeds, providing the fastest route. The resulting shortest paths are then used to define ‘local OD regions’ – areas where travel patterns are highly correlated due to shared routes and proximity. By focusing prediction efforts on these localized regions, computational efficiency is improved and the accuracy of traffic flow forecasting is enhanced, as shorter-distance trips significantly contribute to overall traffic volume.

The DeepDemand model utilizes a Travel Time Deterrence Function to quantify the relationship between perceived travel time and route selection. This function, empirically derived from observed traffic patterns, assesses how changes in travel time on a given route impact the probability of a traveler choosing that route versus alternative options. Specifically, the model learns the elasticity of route choice with respect to travel time; increased travel time on a route leads to a predictable decrease in its utilization as travelers divert to faster alternatives. This allows DeepDemand to simulate realistic shifts in traffic flow based on congestion or disruptions, improving the accuracy of traffic predictions by accounting for rational traveler behavior.

DeepDemand leverages detailed insights into traveler behavior, specifically how individuals respond to changes in travel time, in conjunction with comprehensive spatial data to produce highly localized traffic forecasts. This approach allows for prediction at a granular level, moving beyond broad averages. Rigorous spatial cross-validation has been performed to assess the model’s generalizability, demonstrating stable performance across different geographic regions with a resulting R² value of 0.665, indicating a strong correlation between predicted and observed traffic patterns.

Origin-destination (OD) pairs are identified by competitively expanding from edge-adjacent locations and are then screened for validity based on whether their shortest route necessitates traversing the target edge, excluding pairs with faster bypass routes.
Origin-destination (OD) pairs are identified by competitively expanding from edge-adjacent locations and are then screened for validity based on whether their shortest route necessitates traversing the target edge, excluding pairs with faster bypass routes.

Interpreting the Inevitable: SHAP Values and the Illusion of Control

DeepDemand leverages SHAP (SHapley Additive exPlanations) values to move beyond simple traffic volume predictions, offering a detailed understanding of why a particular forecast is made. These values quantify the contribution of each input feature – encompassing socioeconomic factors and network characteristics – to the final predicted ‘Traffic Volume’. Rather than a ‘black box’ approach, SHAP values distribute the ‘prediction power’ fairly among the features, revealing which elements are driving increases or decreases in expected traffic. This granular level of insight allows transport planners to not only see what the traffic will be, but also to understand how various factors – such as population density, employment rates, or road capacity – are influencing that forecast, thereby fostering trust and enabling data-driven interventions.

DeepDemand’s predictive power is coupled with a crucial layer of interpretability, enabling users to pinpoint the specific factors influencing forecasted traffic volumes. The system doesn’t simply output a number; it reveals why that number is predicted, detailing the contribution of variables like population density, employment rates, or even proximity to key infrastructure. This granular insight extends to network characteristics, identifying how road capacity, number of lanes, or speed limits impact the model’s assessment. Consequently, transport planners can move beyond reactive measures and implement targeted strategies, addressing the root causes of congestion and optimizing network performance based on a clear understanding of the driving forces behind traffic patterns.

DeepDemand distinguishes itself not merely through predictive accuracy, but through a commitment to transparent and actionable insights for transport planners. The system delivers predictions alongside detailed explanations of why a specific traffic volume is forecast, attributing influence to individual socioeconomic factors and network characteristics. This level of interpretability is crucial for building trust in the model’s output; planners can move beyond accepting a number to understanding the underlying rationale, validating the forecast against their own expertise and local knowledge. Consequently, DeepDemand empowers data-driven decision-making, enabling targeted infrastructure investments, optimized traffic management strategies, and proactive responses to evolving transportation needs, ultimately fostering more efficient and resilient road networks.

DeepDemand’s predictive capabilities are substantiated by a Mean Absolute Error (MAE) of just 7406 vehicles, a level of precision that translates directly into real-world utility for transportation planning. This granular accuracy isn’t simply a statistical benchmark; it furnishes planners with the detailed insights necessary to pinpoint specific congestion hotspots and evaluate the potential impact of targeted interventions. Consequently, resources can be allocated more effectively, whether it’s optimizing traffic signal timings, implementing dynamic lane management, or prioritizing infrastructure improvements – all with the goal of enhancing network efficiency and mitigating traffic delays. The model’s demonstrated reliability fosters data-driven decision-making, moving beyond reactive measures towards proactive strategies for a more resilient and streamlined transportation system.

Analysis of SHAP values from random forest models reveals that origin and destination LSOA characteristics-including population age structure, employment sectors, land use, points of interest, deprivation, and car ownership-significantly influence OD pair scores.
Analysis of SHAP values from random forest models reveals that origin and destination LSOA characteristics-including population age structure, employment sectors, land use, points of interest, deprivation, and car ownership-significantly influence OD pair scores.

The pursuit of elegant models, as demonstrated by DeepDemand’s attempt to fuse socioeconomic data with network topology, invariably runs headfirst into the brick wall of reality. It’s a predictable pattern; a framework promising scalable, interpretable traffic prediction will, inevitably, require constant patching and adaptation as production data reveals unforeseen edge cases. As Robert Tarjan once observed, “Programming is more of an art than a science.” This holds remarkably true for complex systems like traffic modelling; DeepDemand, despite its sophisticated approach to Origin-Destination (OD) modeling, will eventually succumb to the same forces that plague all software – the relentless pressure of real-world complexity and the ever-shifting landscape of user behavior. It’s not a failure of the model, but a testament to the inherent messiness of the systems it attempts to represent.

What’s Next?

The pursuit of ‘interpretable’ deep learning for traffic demand modeling feels, predictably, like chasing a phantom. The model accurately predicts traffic volumes – a temporary victory. It will, inevitably, become a black box decorated with post-hoc explanation attempts. Someone will inevitably claim it ‘understands’ congestion, and then funding will appear. The current framework, though elegant on paper, will eventually be deployed on a network where data quality resembles a fever dream, and the carefully curated socioeconomic features will be replaced with whatever’s cheapest to collect.

The real challenge isn’t building a better model; it’s accepting that the underlying system – human travel behavior – is fundamentally chaotic. Attempts to distill it into neat, interpretable components are charming, but ultimately futile. This work, like so many before it, will likely evolve into a complex series of heuristics, justified with layers of statistical significance. It used to be a simple bash script, honestly.

Future research will undoubtedly focus on ‘scaling’ this framework, adding more data, and incorporating real-time feedback loops. This feels less like progress and more like accumulating tech debt. The documentation will lie again, and someone, somewhere, will be surprised when the model fails during the next unexpected event. It’s the nature of the beast.


Original article: https://arxiv.org/pdf/2603.26440.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-30 11:44