Ride Rivals: AI Learns to Compete in Autonomous Vehicle Fleets

Author: Denis Avetisyan


New research explores how artificial intelligence can optimize pricing and vehicle distribution when multiple companies operate competing on-demand mobility services.

The study demonstrates that incorporating competitor price visibility accelerates convergence across rebalancing, pricing, and joint strategies, as evidenced by smoothed reward curves over 30 episodes-excluding an initial 5,000 episode burn-in-and highlights that training rewards, derived from policy sampling, provide a valid, though potentially conservative, measure of performance.
The study demonstrates that incorporating competitor price visibility accelerates convergence across rebalancing, pricing, and joint strategies, as evidenced by smoothed reward curves over 30 episodes-excluding an initial 5,000 episode burn-in-and highlights that training rewards, derived from policy sampling, provide a valid, though potentially conservative, measure of performance.

This study investigates competitive multi-operator reinforcement learning for joint pricing and fleet rebalancing in autonomous mobility-on-demand systems.

While reinforcement learning shows promise in optimizing autonomous mobility-on-demand (AMoD) systems, existing approaches largely ignore the realities of competitive market dynamics. This paper, ‘Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems’, introduces a multi-agent framework to investigate how competition impacts policy learning for pricing and fleet rebalancing. Our results demonstrate that competitive agents can effectively converge to robust strategies, leading to lower prices and distinct fleet positioning compared to monopolistic scenarios, by integrating discrete choice theory to model passenger allocation. Will these learning-based approaches ultimately unlock more efficient and responsive urban transportation networks in the face of increasing competition?


The Inevitable Convergence: Reimagining Urban Mobility

Contemporary urban centers grapple with a confluence of transportation issues that diminish quality of life and hinder economic progress. Vehicle congestion routinely costs individuals valuable time and businesses significant revenue, while simultaneously exacerbating air pollution and contributing to public health crises. This situation is further complicated by limited accessibility for certain populations-including the elderly, individuals with disabilities, and those residing in underserved areas-who often lack reliable or affordable transportation options. The combined effect of these challenges creates a pressing need for innovative solutions that can reshape urban mobility, fostering more sustainable, equitable, and efficient systems for all residents.

Autonomous Mobility-on-Demand (AMoD) envisions a future where transportation transcends the limitations of private vehicle ownership. This system proposes a fleet of self-driving vehicles available to users through a service, much like current ride-sharing platforms, but without the driver. The potential benefits are substantial: by optimizing routes and vehicle allocation, AMoD can dramatically reduce traffic congestion and associated pollution. Furthermore, the increased efficiency and shared utilization promise to lower transportation costs for individuals, making mobility more accessible, particularly for those without access to personal vehicles. Such a shift could reshape urban landscapes, reducing the need for extensive parking infrastructure and freeing up space for more productive uses, ultimately fostering more sustainable and livable cities.

The successful integration of Autonomous Mobility-on-Demand (AMoD) systems hinges on the development of remarkably complex control mechanisms. These systems aren’t simply about navigating streets; they require real-time optimization of vehicle dispatch, routing, and rebalancing to meet fluctuating demand across an entire urban network. Effective control must account for unpredictable events – traffic incidents, sudden surges in requests, or vehicle malfunctions – and dynamically adjust operations to maintain service levels. Moreover, these control systems must operate with a degree of foresight, predicting future needs based on historical data and current conditions, while also prioritizing energy efficiency and minimizing wait times. Achieving this level of operational finesse necessitates advanced algorithms, robust communication infrastructure, and continuous learning capabilities to adapt to the ever-changing dynamics of a city.

A Rigorous Framework for Competitive Analysis

The emergence of multiple Automated Mobility-on-Demand (AMoD) operators introduces significant complexities to service modeling. Unlike scenarios with a single provider, a competitive landscape requires analysis beyond individual operator optimization. A Competitive Multi-Operator Framework is therefore necessary to accurately represent market dynamics, accounting for passenger distribution among competing services. This framework must consider the strategic interactions between operators – including pricing, fleet deployment, and service differentiation – and their collective impact on overall system performance, passenger experience, and market equilibrium. Ignoring this competition can lead to inaccurate predictions of demand, inefficient resource allocation, and unrealistic projections of AMoD system viability.

A Choice Model is a fundamental component in predicting passenger selections within an Automated Mobility on Demand (AMoD) system. This model utilizes quantifiable parameters to estimate the probability of a passenger choosing a particular service option. Core to this prediction are price and service quality; passengers evaluate these attributes relative to their individual economic constraints and preferences. Specifically, Passenger Wage represents the opportunity cost of travel time, impacting willingness to pay, while Price Sensitivity – often represented by an elasticity coefficient – determines the degree to which demand changes in response to price fluctuations. The model typically employs discrete choice theory, assigning utilities to each option and predicting the selection with the highest utility, incorporating random components to account for unobserved factors influencing passenger decisions.

Spatial demand patterns exert a significant influence on the operational efficiency of Autonomous Mobility on Demand (AMoD) systems. Uneven geographic distribution of passenger requests necessitates dynamic fleet management strategies to minimize empty vehicle kilometers and overall wait times. Areas with consistently high demand require a greater vehicle density, while low-demand zones may benefit from reduced service or dynamic repositioning of vehicles. Accurate modeling of passenger origin-destination matrices, considering population density, points of interest, and transportation infrastructure, is crucial for optimizing vehicle allocation and routing. Failing to account for these spatial variations can lead to inefficient fleet utilization, increased operating costs, and diminished service quality, ultimately impacting the economic viability of the AMoD system.

A Queueing System provides a robust method for simulating and analyzing the dynamic interactions between Autonomous Mobility-on-Demand (AMoD) operators, passenger demand, and service performance. This system models passengers as arriving requests for service, forming queues at virtual or physical locations. Key performance indicators (KPIs) derived from the queueing model include average and maximum passenger waiting times, system utilization rates, and the probability of service denial due to capacity constraints. By varying parameters such as fleet size, operator pricing, and passenger arrival rates, researchers and operators can evaluate the impact of different scenarios on system performance and establish appropriate service thresholds to maintain acceptable quality of service. The queueing model can incorporate multiple queueing disciplines – such as First-Come, First-Served or priority-based queuing – to reflect different operational strategies and account for varying passenger preferences.

Average hourly passenger wages vary significantly across regions within southern Manhattan, reflecting regional income disparities.
Average hourly passenger wages vary significantly across regions within southern Manhattan, reflecting regional income disparities.

Precision Control: Rebalancing Fleet Operations

Fleet rebalancing is a core operational strategy for on-demand transportation services, directly impacting key performance indicators such as passenger wait times and overall service coverage. The process involves proactively repositioning vehicles – including cars, scooters, and bikes – from areas of low demand to areas of anticipated high demand. Effective rebalancing minimizes the probability of passengers facing extended wait times or being unable to secure a vehicle when needed. This is achieved by ensuring sufficient vehicle density in high-demand zones while avoiding over-concentration and potential congestion in low-demand areas. The frequency and precision of rebalancing actions are critical, necessitating real-time data analysis of passenger requests, vehicle locations, and predictive modeling of future demand patterns to maintain optimal fleet distribution.

Model Predictive Control (MPC) improves upon conventional fleet rebalancing methods by incorporating demand prediction into vehicle positioning decisions. Traditional rebalancing often relies on reactive adjustments based on current requests; MPC, however, utilizes forecasted demand to proactively relocate vehicles to anticipated high-demand areas. This predictive capability involves establishing a system model that describes vehicle dynamics and demand patterns, along with a cost function that quantifies rebalancing effort and potential service quality impacts. An optimization algorithm then iteratively solves for the control actions – vehicle relocations – that minimize the cost function over a defined prediction horizon, effectively preempting potential imbalances and reducing overall wait times. The control solution is typically recalculated at each time step, incorporating new demand information and adjusting the vehicle positioning strategy accordingly.

Reinforcement Learning (RL) provides a fleet control methodology that bypasses the need for pre-defined, explicit models of demand or system behavior. Instead, an RL agent learns optimal fleet management policies through direct interaction with a simulated or real-world environment. This is achieved by iteratively taking actions – such as repositioning vehicles – and receiving rewards or penalties based on the resulting outcomes, like reduced wait times or increased service coverage. The agent then adjusts its strategy to maximize cumulative rewards over time, effectively discovering effective control policies directly from data. This data-driven approach allows the system to adapt to changing conditions and complex, non-linear dynamics without requiring manual model tuning or extensive prior knowledge.

Dual-Operator Reinforcement Learning (RL) introduces a competitive dynamic to fleet control by simulating multiple operators simultaneously managing vehicle fleets. Unlike single-operator RL, this approach necessitates agents that can anticipate and react to the strategic decisions of competing operators. Through this competitive interaction, the system learns optimal policies for vehicle rebalancing and deployment, resulting in a demonstrated Total Reward of up to 18,983.6 under specified operational conditions – including a simulation area of 2500m x 2500m, 20 vehicles per operator, and a request rate of 2 requests/minute. This performance improvement is attributed to the ability to dynamically adjust strategies in response to competitor actions, leading to more efficient resource allocation and increased overall system profitability.

The joint control policy achieves rebalancing by inducing net flows of vehicles, with red indicating cumulative inflows to and blue indicating outflows from specific locations.
The joint control policy achieves rebalancing by inducing net flows of vehicles, with red indicating cumulative inflows to and blue indicating outflows from specific locations.

Graph-Based Intelligence: Augmenting System Acumen

The success of any reinforcement learning system hinges on a carefully constructed reward function, a mechanism that quantifies the desirability of different actions. In the context of autonomous mobility-on-demand (AMoD) systems, this function must balance competing priorities: serving customer requests and minimizing operational costs. Simply maximizing served demand can lead to unsustainable rebalancing efforts – the process of repositioning vehicles to anticipate future needs. Therefore, effective reward functions commonly integrate total cost, encompassing expenses like fuel, maintenance, and driver compensation, as a crucial penalty. This ensures the system doesn’t prioritize fulfilling every request at the expense of long-term financial viability, instead promoting a strategy that optimizes both service levels and economic efficiency – a vital component for scalable and resilient AMoD deployments.

The inherent structure of transportation networks, with their interconnected roads, intersections, and varying traffic flows, presents a unique challenge for artificial intelligence. Graph Neural Networks (GNNs) offer a powerful solution by directly representing this spatial information as a graph, where locations are nodes and roads are edges. This allows the system to move beyond treating locations as isolated points and instead understand their relationships – how traffic congestion on one street impacts another, or how proximity to a major event influences demand. By effectively encoding these complex dependencies, GNNs enable the system to learn patterns and predict outcomes with greater accuracy than traditional methods, capturing the nuanced dynamics of urban mobility and forming a crucial component in optimizing autonomous mobility-on-demand services.

The fusion of Graph Neural Networks with a Dual-Operator Reinforcement Learning framework dramatically elevates the system’s capacity for complex decision-making within the Autonomous Mobility on Demand (AMoD) environment. This integration doesn’t simply improve existing processes; it unlocks a fundamentally more sophisticated approach to managing the fleet and responding to dynamic demand. Simulations demonstrate this capability by achieving a Total Served Demand of 3579.5 units under coordinated control – a significant leap beyond systems operating with less nuanced intelligence. By enabling a collaborative decision process between operators, the framework capitalizes on the spatial awareness provided by the graph networks, allowing for proactive resource allocation and optimized routing strategies that maximize service coverage and responsiveness.

The combined system demonstrates a marked ability to foresee shifts in passenger demand and proactively adjust vehicle locations, leading to substantial improvements in key performance indicators for Autonomous Mobility-on-Demand (AMoD) services. Through this anticipatory optimization, the average passenger wait time is reduced to just 1.97 minutes, a significant enhancement over less responsive systems. Simultaneously, the average price scalar of 0.96 indicates effective cost management and competitive pricing, suggesting a balance between service availability and economic viability. These results collectively point to a more efficient and sustainable AMoD model, capable of meeting dynamic transportation needs while minimizing operational expenses and maximizing resource utilization.

Cumulative vehicle movements reveal net flows between regions, with red indicating areas receiving vehicles and blue showing those sending them.
Cumulative vehicle movements reveal net flows between regions, with red indicating areas receiving vehicles and blue showing those sending them.

The study meticulously details a competitive landscape within autonomous mobility-on-demand systems, demanding solutions rooted in provable strategies. This echoes John McCarthy’s assertion that “the best way to program is to write code that doesn’t need to be understood.” The paper’s approach to multi-operator reinforcement learning, where algorithms learn through competitive interaction, isn’t merely about achieving functional outcomes – it’s about deriving logically sound policies for pricing and fleet rebalancing. The elegance lies in the system’s ability to converge on optimal strategies, not through brute force, but through a mathematically defensible process. It is a system where correctness, rather than mere operationality, defines success.

What’s Next?

The demonstrated efficacy of multi-agent reinforcement learning in navigating competitive pricing and rebalancing within autonomous mobility-on-demand systems, while promising, merely scratches the surface of a fundamentally complex problem. The current formulations, reliant on Markov Decision Processes, implicitly assume a degree of observability that is unlikely to hold in genuinely dynamic, real-world deployments. A rigorous exploration of partially observable Markov decision processes, and the development of algorithms provably robust to imperfect information, remains a critical, and largely untouched, challenge.

Furthermore, the notion of ‘competition’ itself is often treated as a zero-sum game. However, consumer behavior is rarely so predictable. Future work should investigate mechanisms for modeling consumer elasticity, brand loyalty, and the emergence of cooperative behaviors-perhaps through game-theoretic frameworks extending beyond simple competition. A purely ‘optimal’ solution, devoid of an understanding of irrationality, is, at best, incomplete.

Ultimately, the true test will lie in moving beyond simulated environments. A formal verification of learned policies, demonstrating their safety and stability under adversarial conditions, is paramount. The pursuit of ‘working’ algorithms is insufficient; a proof of correctness, even an approximate one, is the only path towards truly trustworthy autonomous systems. Until then, these remain elegant exercises, not solutions.


Original article: https://arxiv.org/pdf/2603.05000.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-07 08:32