The Price of Playing Follow the Leader

Author: Denis Avetisyan

New research reveals how competitive experimentation in dynamic pricing can inadvertently push prices higher, even without explicit collusion.

Correlated experimentation among sellers introduces a learning bias that drives pricing towards a Conjectural Variations equilibrium, potentially resulting in supra-competitive outcomes.

While conventional models of competitive pricing assume rational actors with complete information, real-world markets increasingly rely on iterative experimentation and algorithmic adjustments. This paper, ‘Experimentation, Biased Learning, and Conjectural Variations in Competitive Dynamic Pricing’, investigates how such learning-based dynamic pricing strategies among multiple sellers converge to equilibrium. We demonstrate that correlated experimentation induces a systematic bias in demand learning, effectively selecting a Conjectural Variations equilibrium and potentially leading to supra-competitive prices-a result notably absent when experimentation is independent. Does this learning bias represent a novel market design lever, allowing sellers to strategically shape competitive outcomes through the design of their experimentation protocols?

The Algorithmic Imperative: Pricing and Gross Merchandise Value

For platforms operating in competitive marketplaces, effective pricing strategies are not merely a component of success, but a fundamental driver of Gross Merchandise Value (GMV). GMV, representing the total value of goods sold through the platform, is acutely sensitive to price fluctuations; even marginal adjustments can yield substantial changes in sales volume and overall revenue. This relationship is particularly pronounced in dynamic markets where consumer price sensitivity is high and competitor pricing is constantly evolving. Consequently, platforms invest heavily in pricing algorithms and data analytics, striving to pinpoint the optimal price point that maximizes both sales and profitability. A failure to accurately assess and respond to these market forces can quickly erode market share and diminish overall financial performance, underscoring the critical importance of sophisticated pricing mechanisms.

Historically, many platforms relied on static pricing – setting a fixed price for goods or services, adjusted infrequently. However, this approach often fails to account for the complex interplay of factors influencing consumer decisions. Nuances like time of day, competitor pricing, individual customer profiles, and even external events – such as weather or trending news – significantly impact willingness to pay. Static models treat all customers and all moments in time as identical, leading to missed revenue opportunities and potential inventory issues. This inability to respond to real-time market dynamics and individual preferences results in suboptimal pricing strategies, hindering a platform’s ability to maximize its Gross Merchandise Value and maintain a competitive edge. Consequently, a shift towards more responsive and data-driven pricing methods has become increasingly vital.

Precisely gauging consumer demand is foundational for any platform seeking to optimize revenue, yet this proves remarkably difficult within competitive marketplaces. The inherent complexity arises from the multitude of interacting factors influencing purchasing decisions – not only the price of a given product, but also competitor pricing, promotional activities, seasonal trends, and even broader economic conditions. Existing demand estimation techniques often struggle to disentangle these influences, leading to inaccurate forecasts and suboptimal pricing strategies. Consequently, platforms face a constant challenge in adapting to rapidly shifting market dynamics and discerning genuine shifts in consumer preference from temporary fluctuations, requiring increasingly sophisticated analytical tools and algorithms to achieve reliable demand insights.

Understanding the precise relationship between price adjustments and consumer demand is crucial for platforms aiming to maximize revenue. Consequently, a shift towards dynamic pricing strategies-where prices are algorithmically adjusted in real-time based on factors like competitor pricing, inventory levels, and individual customer behavior-has become increasingly prevalent. These strategies move beyond simple cost-plus models, instead leveraging data analysis and predictive modeling to identify price elasticity – the degree to which changes in price affect demand. Sophisticated algorithms can then test various price points, learn from consumer responses, and optimize pricing to capture the maximum possible revenue without deterring potential buyers. This continuous learning loop allows platforms to refine their understanding of demand curves and implement pricing that is both competitive and profitable, ultimately driving Gross Merchandise Value.

Switchback Learning: A Mathematically Rigorous Approach to Dynamic Pricing

Switchback Linear Demand Learning is a dynamic pricing strategy that iteratively refines price points to model customer demand. The method functions by sequentially testing different prices and observing the resulting sales quantities, using this data to estimate the parameters of a linear demand curve – typically expressed as $Q = a - bP$ , where Q is quantity, P is price, and a and b are coefficients representing intercept and slope, respectively. This iterative process allows the algorithm to move towards a more accurate representation of the price-quantity relationship without requiring prior knowledge of the demand function. The “switchback” aspect refers to the exploration of different price levels in a systematic manner, balancing exploration of new price points with exploitation of those already known to yield positive results. This approach provides a practical means of estimating demand and optimizing pricing decisions in real-time, particularly beneficial in environments where demand is uncertain or changes frequently.

Two-Point Price Randomization is employed to determine price elasticity by randomly assigning one of two prices to each customer or transaction. This allows the algorithm to observe the resulting purchase behavior – whether the customer buys at the higher or lower price – and collect data on price sensitivity. The resulting data is then treated as Bandit Feedback, specifically utilizing an Upper Confidence Bound (UCB) approach. This UCB mechanism balances exploration – testing prices with high uncertainty – and exploitation – choosing prices predicted to yield high revenue – to efficiently refine the algorithm’s understanding of the demand curve and converge on optimal pricing decisions. The algorithm iteratively updates its price selection based on the observed rewards (revenue) from each randomized price, favoring prices that have historically generated higher returns while still occasionally testing alternative prices to account for uncertainty.

The Switchback learning algorithm fundamentally operates on the principle of a linear demand model, positing a relationship between price and quantity demanded, expressed as $Q = a - bP$ , where Q represents quantity, P represents price, and a and b are coefficients representing intercept and slope, respectively. The algorithm iteratively refines estimates of these coefficients through observed price-quantity pairs generated by its price randomization strategy. Each price adjustment and resulting sale provides data points used to update the values of a and b via regression or similar statistical methods. This continuous improvement of the linear demand model allows the algorithm to progressively converge on a more accurate representation of customer price sensitivity and ultimately, optimize pricing decisions.

The Switchback learning algorithm employs iterative price adjustments to identify a pricing policy that maximizes revenue or profit. This convergence is achieved through a continuous feedback loop: the algorithm tests different price points, observes the resulting demand, and updates its internal linear demand model. Each iteration refines the model’s parameters, progressively narrowing the range of prices considered and homing in on the price that yields the highest expected return. The algorithm doesn’t seek a static “optimal” price, but rather a dynamic policy that adapts to observed market conditions and minimizes the cumulative regret of suboptimal pricing decisions over time. This process relies on the principle that, given sufficient data, the algorithm will approach a policy that consistently delivers near-optimal results, although complete convergence isn’t guaranteed in all scenarios.

The Shadow of Bias: Correlated Experimentation and Algorithmic Accuracy

Correlated experimentation, occurring when multiple sellers adjust prices based on overlapping or shared data, introduces substantial learning bias into price optimization algorithms. This arises because the observed outcomes of one seller’s price change are not statistically independent of the price changes and resulting outcomes of other sellers. Consequently, estimations of demand response are systematically skewed; a price increase, for example, might appear more or less effective than it truly is due to concurrent actions by competitors. This interdependency violates the assumptions of many standard learning algorithms, leading to inaccurate demand curves and flawed pricing decisions. The resulting bias affects the reliability of observed data used for training and evaluation, ultimately hindering the accurate estimation of true consumer demand and the potential for achieving optimal pricing strategies.

Omitted Variable Bias arises in dynamic pricing experiments when unobserved factors influence both the price set by the seller and the resulting demand, creating a spurious correlation. Specifically, if a seller adjusts prices based on external conditions – such as competitor actions or seasonal trends – that are not explicitly accounted for in the demand model, the estimated price elasticity will be biased. This occurs because the observed demand is affected by both the price and these omitted variables, leading to an inaccurate representation of the true demand relationship. Consequently, the seller may misinterpret consumer responsiveness to price changes, resulting in suboptimal pricing decisions and an inability to accurately predict demand under different price points.

Switchback learning, a reinforcement learning technique commonly used in dynamic pricing, is demonstrably affected by biases introduced through correlated experimentation. These biases impede the algorithm’s ability to accurately estimate the demand curve and, consequently, converge towards the true Conjectural Variations (CV) equilibrium – the stable outcome where each seller correctly anticipates competitors’ pricing responses. Instead, the algorithm converges to a suboptimal or inaccurate CV equilibrium, meaning the learned pricing strategy will not maximize profit given the true competitive landscape. Specifically, the paper establishes that under correlated experimentation, the mean squared error of convergence to this inaccurate CV equilibrium scales at a rate of $\tilde{O}(T^{-1/2})$ , where T represents the number of time periods, indicating a slower and less reliable learning process compared to unbiased experimentation.

The research establishes that correlated experimentation systematically biases the convergence of switchback learning algorithms towards a Conjectural Variations (CV) equilibrium. Specifically, under conditions of appropriate parameter scaling, the mean squared error convergence rate is demonstrated to be $˜O(T^{-1/2})$ , where T represents the time horizon. This rate indicates that the precision with which the algorithm converges to the CV equilibrium increases with the square root of the time horizon, implying a relatively slow convergence speed and a persistent, quantifiable error bound. The analysis provides a theoretical characterization of the convergence behavior, highlighting the impact of correlated experimentation on the accuracy and efficiency of dynamic pricing strategies.

Strategic Implications: Towards a Stable and Predictable Market Equilibrium

The Conjectural Variations Equilibrium (CVE) framework offers a powerful lens through which to analyze the complex dance of strategy among sellers in a market. Unlike traditional models assuming perfect competition or complete collusion, CVE acknowledges that each seller forms beliefs about how its rivals will react to its own pricing decisions. This framework doesn’t predict a single outcome, but rather a range of possible equilibria dependent on these ‘conjectural variations’ – essentially, each seller’s anticipation of the marginal response of its competitors. By modeling these beliefs, CVE moves beyond simple supply and demand, revealing how expectations themselves can become self-fulfilling prophecies, driving market outcomes and shaping competitive landscapes. The resulting equilibrium isn’t necessarily the most efficient for all parties, but represents a stable state where, given the beliefs about rivals, no individual seller has an incentive to alter its pricing strategy.

The stability of any pricing equilibrium within a competitive market fundamentally rests upon a seller’s expectations regarding rivals’ reactions. This is encapsulated in the Conjecture Matrix, a crucial component of game-theoretic modeling. This matrix doesn’t simply predict competitor behavior; it maps out each seller’s beliefs about how much rivals will alter their prices in response to a given price change by the seller. A seller anticipating rivals will match price increases, for example, will behave differently than one expecting rivals to maintain existing prices or even undercut them. The accuracy of these conjectures-and therefore the resulting equilibrium-is heavily influenced by factors like market transparency, historical interactions, and the perceived credibility of competitors’ threats or promises. Consequently, understanding the Conjecture Matrix provides invaluable insight into the dynamics of price competition and the likelihood of sustained pricing strategies.

When goods or services are strategic complements, a fascinating dynamic unfolds in competitive markets. An increase in the price offered by one seller doesn’t necessarily trigger a price war; instead, it often prompts rivals to also raise their prices. This counterintuitive behavior stems from the belief that the price increase signals improved demand or cost conditions, making a higher price attainable for everyone. For example, if one luxury hotel raises rates, others might follow suit, anticipating that consumers willing to pay more for the initial increase are also likely to accept higher prices elsewhere. This positive feedback loop-where one seller’s action reinforces similar actions by competitors-distinguishes strategic complements from strategic substitutes, where price increases typically lead to price decreases. Understanding this relationship is crucial for accurately predicting market behavior and formulating effective pricing strategies.

The pursuit of a stable market outcome benefits from strategies that minimize redundant price exploration. When sellers independently, and repeatedly, test price changes – a phenomenon known as correlated experimentation – it can delay convergence toward a Nash Equilibrium. This equilibrium represents a state where each seller’s chosen price maximizes their profit, given the prices of their competitors, and crucially, no seller finds it advantageous to alter their price unilaterally. By reducing such correlated actions – perhaps through greater transparency or a shared understanding of market dynamics – sellers can avoid unnecessary price volatility and more efficiently reach a stable, mutually beneficial equilibrium where strategic deviation offers no advantage. Ultimately, minimizing redundant experimentation streamlines the process of reaching a predictable and sustainable market state.

The study meticulously reveals how correlated experimentation in dynamic pricing scenarios doesn’t simply discover an equilibrium; it actively creates one, specifically aligning with the predictions of Conjectural Variations. This induced bias, where sellers learn from each other’s experiments, pushes the market toward a supra-competitive outcome. It echoes Blaise Pascal’s sentiment: “The eloquence of youth is that it knows nothing.” In this context, the ‘youth’ is the market, initially exploring possibilities, but the correlated learning swiftly imposes a structure – a revealed ‘invariant’ – limiting the range of possible pricing strategies. If it feels like magic, it’s merely the predictable consequence of a provable system, not a genuine discovery.

Where Do We Go From Here?

The demonstration that correlated experimentation in dynamic pricing systems reliably converges toward a Conjectural Variations equilibrium is… unsettling. It is not a triumph of algorithm design, but a formalization of a long-suspected truth: competition, when mediated by opaque learning processes, may not deliver outcomes predicated on classical economic assumptions. The paper does not solve the problem of supra-competitive pricing, but rather reveals its inherent mathematical inevitability under these conditions. The question, therefore, shifts from ‘how do we prevent it?’ to ‘how widely is this phenomenon already present?’

Future work must address the limitations of the current framework. The assumption of complete observability of competitor pricing, while useful for analytical tractability, is rarely met in practice. Relaxing this assumption introduces significant challenges, but is crucial for assessing real-world applicability. Furthermore, the analysis focuses on a relatively simple game structure. Extending the model to incorporate more complex strategic complementarities, or to account for consumer heterogeneity, will undoubtedly reveal further nuances and potential instabilities.

Ultimately, the most pressing question is not about refining the algorithms themselves, but about the design of the market. If learning is inherently biased, and that bias predictably leads to undesirable outcomes, then the focus should shift to mechanisms that counteract those biases – or, perhaps, to acknowledging that some degree of market inefficiency is an unavoidable consequence of allowing algorithms to learn from incomplete information. Optimization without analysis, after all, is merely self-deception.

Original article: https://arxiv.org/pdf/2602.12888.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Algorithmic Imperative: Pricing and Gross Merchandise Value

Switchback Learning: A Mathematically Rigorous Approach to Dynamic Pricing

The Shadow of Bias: Correlated Experimentation and Algorithmic Accuracy

Strategic Implications: Towards a Stable and Predictable Market Equilibrium

Where Do We Go From Here?

See also: