Who Controls the Future of AI?

Author: Denis Avetisyan

Dominant tech companies are poised to control not just AI models, but the crucial process of inference, creating a new bottleneck for competition.

A system architecture enables nuanced control over model access and performance-steering users without altering published prices-through adjustable quality-of-service parameters, selective feature availability, and intelligent routing based on tool eligibility and default settings.

This paper proposes ‘Neutral Inference’ – auditable standards for API gateways – to ensure fair access and prevent self-preferencing in the age of cognitive infrastructure.

While competitive advantage in artificial intelligence is increasingly shifting from model training to deployment, the resulting concentration of inference capabilities risks becoming a new locus of market power. This paper, ‘The Inference Bottleneck: Antitrust and Neutrality Duties in the Age of Cognitive Infrastructure’, argues that large-scale inference can function as critical cognitive infrastructure, creating foreclosure opportunities beyond price discrimination through subtle controls over service quality and routing. We propose a framework for analyzing these harms using raising-rivals’-costs logic, and introduce ‘Neutral Inference’-a set of auditable technical standards centered on quality-of-service parity and routing transparency-applied only where demonstrable gatekeeper status exists. Can targeted, auditable conduct remedies effectively address the emerging challenges to competition in this rapidly evolving landscape?

The Expanding Infrastructure of Inference

Inference services are rapidly solidifying their position as essential infrastructure for a diverse and expanding array of applications, from natural language processing and image recognition to personalized recommendations and autonomous systems. This reliance isn’t merely a convenience; it represents a critical bottleneck in deploying advanced artificial intelligence. Increasingly, developers aren’t building AI models from scratch, but rather leveraging pre-trained models and accessing their capabilities through these inference services. Consequently, the ability to efficiently and cost-effectively perform inference – the process of applying a trained model to new data – is now a fundamental requirement for innovation, making access to robust inference infrastructure a key determinant of success across numerous industries. The concentration of this vital capability within a limited number of providers presents both opportunities and challenges as the field matures.

As artificial intelligence models become integral to countless applications, a reliance on the services that run those models-known as inference-is creating potential for significant market power imbalances. Companies controlling access to these inference resources can act as gatekeepers, and this control opens avenues for ‘vertical foreclosure’. This occurs when a dominant firm leverages its position in one market-inference services-to disadvantage competitors in another, such as application development. For example, a provider of inference services could prioritize its own applications, offer unfavorable terms to rivals, or even restrict access entirely, effectively squeezing out competition before it has a chance to flourish. This dynamic presents a growing concern, as it moves beyond competition within AI model creation to control over the very infrastructure needed to deploy those models, potentially stifling innovation and limiting consumer choice.

The delivery of artificial intelligence capabilities increasingly relies on Application Programming Interfaces, or APIs, which serve as the crucial link between AI models and the applications that utilize them. However, control over these APIs is rapidly becoming concentrated in the hands of a few powerful companies. This consolidation presents a significant challenge to fair access, as these gatekeepers could potentially favor their own applications, impose discriminatory pricing, or even restrict access to competing developers. The centralization of inference APIs doesn’t just affect innovation; it threatens to stifle competition by creating barriers to entry for smaller AI firms and limiting consumer choice, ultimately hindering the broader advancement and equitable distribution of AI-driven technologies.

As machine learning models become deeply embedded within essential digital services, the process of inference – utilizing these models to generate outputs – is rapidly becoming a critical component of the technological landscape. This growing dependence demands careful scrutiny of potential anti-competitive practices. Control over inference infrastructure allows a limited number of providers to potentially disadvantage rivals by limiting access, increasing costs, or favoring their own downstream services. Such behavior could stifle innovation and ultimately harm consumers. A thorough examination of these emerging dynamics is therefore essential to ensure a fair and competitive market, safeguarding against the creation of undue market power and fostering continued advancement in artificial intelligence.

Neutral Inference: A Framework for Equitable Access

Neutral Inference addresses biased outcomes in machine learning by delivering inference as a service subject to defined, measurable obligations. This approach moves beyond simply providing access to models and instead focuses on the service of inference itself, requiring providers to operate in a non-discriminatory manner. Quantifiable obligations, as defined within our framework, establish specific performance criteria related to fairness, such as equalized odds or demographic parity, which are actively monitored and enforced. This necessitates transparent reporting of key metrics and mechanisms for redress when obligations are not met, shifting the responsibility for fairness from downstream application developers to the inference provider and creating a verifiable standard for equitable access to AI-driven predictions.

The concept of cognitive infrastructure, traditionally focused on the underlying computational resources supporting artificial intelligence, is being extended to encompass principles of open and equitable access for all applications utilizing those resources. This broadened definition necessitates that inference services, a core component of cognitive infrastructure, be available to downstream applications without undue restriction or discrimination. Specifically, equitable access requires consistent performance characteristics and predictable availability, preventing scenarios where certain applications are systematically favored or disadvantaged in terms of response time, cost, or feature availability. This ensures a level playing field for innovation and prevents the concentration of power within providers of core AI services, fostering broader participation and deployment of AI technologies.

Preventing self-preferencing and biased service delivery within Neutral Inference necessitates the implementation of specific mechanisms. These include request prioritization protocols that operate independently of the requesting application or user, ensuring all queries receive equitable computational resources. Auditable logging of all inference requests, including timestamps, input data hashes, and service allocation metrics, is crucial for detecting and addressing potential bias. Furthermore, model providers must adhere to standardized input/output formats and avoid embedding application-specific logic that could favor certain downstream uses. Technical safeguards, such as differential privacy techniques and fairness-aware algorithms, can mitigate the propagation of bias present in training data or model architecture, and regular adversarial testing can proactively identify vulnerabilities to biased outputs.

Implementing Neutral Inference necessitates architectural designs that decouple inference service provision from model ownership and application specifics. This includes utilizing standardized interfaces for requests and responses, enabling consistent performance across diverse models and users. Governance frameworks must establish clear service level agreements (SLAs) defining acceptable latency, throughput, and error rates, alongside mechanisms for monitoring and auditing service delivery to detect and mitigate bias. Furthermore, these frameworks require protocols for dispute resolution and enforcement of neutrality obligations, potentially involving independent third-party verification of system behavior and performance metrics. A key component is establishing data provenance tracking to ensure equitable access to computational resources and prevent preferential treatment based on user or application characteristics.

Transparency and Auditability: The Regulatory Landscape

The European Union’s AI Act and Digital Markets Act (DMA) collectively establish a heightened regulatory environment for large technology companies operating as digital gatekeepers. The AI Act, focused on mitigating risks associated with artificial intelligence, mandates transparency regarding the datasets, algorithms, and decision-making processes of AI systems, particularly those deemed high-risk. Concurrently, the DMA aims to prevent anti-competitive practices by these same gatekeepers, requiring interoperability and data access for competitors. Both regulations necessitate detailed documentation and reporting of operations, including demonstrable compliance with fairness and non-discrimination principles, subjecting these companies to increased oversight from regulatory bodies and potential penalties for non-compliance. This dual regulatory pressure is driving significant investment in compliance infrastructure and transparency mechanisms within the technology sector.

The increasing regulatory pressure surrounding artificial intelligence, notably from legislation like the EU’s AI Act, necessitates robust auditing procedures for inference services. These audits are no longer solely focused on technical performance but critically assess compliance with fairness, accountability, and non-discrimination principles. Auditing involves the systematic examination of inference service components – including data inputs, model parameters, and output generation processes – to identify and mitigate potential biases or discriminatory outcomes. Successful audits require detailed logs, explainability tools, and established metrics for evaluating fairness across different demographic groups, ensuring that deployed models adhere to legal and ethical standards and preventing adverse impacts on protected characteristics.

Routing transparency within an inference system refers to the detailed tracking of a request’s path from its initial entry point through each processing stage – including model selection, data transformations, and any intermediary services – to the final output. This capability is critical for auditability as it allows for the reconstruction of the exact sequence of operations performed on a given input. Detailed routing logs provide evidence of compliance with pre-defined rules and policies, enabling the identification of potential biases or errors introduced at any stage of the inference pipeline. Without routing transparency, verifying the fairness, accuracy, and reliability of an inference service becomes significantly more difficult, hindering effective regulatory oversight and accountability.

Current regulatory frameworks, notably the EU’s AI Act and Digital Markets Act, are establishing legally-binding requirements for fairness and accountability in the delivery of inference services. These regulations move beyond self-regulation by mandating demonstrable compliance with principles of non-discrimination, transparency, and explainability. Specifically, providers of high-risk AI systems and designated digital gatekeepers are now obligated to implement mechanisms for auditing, risk assessment, and documentation of their inference pipelines. Failure to adhere to these stipulations can result in substantial fines and restrictions on service operation, creating a clear economic incentive for the adoption of verifiable and auditable inference practices.

The Impact of Anti-Competitive Tactics

Dominant firms offering machine learning inference services possess the ability to strategically increase operational costs for competitors through techniques such as Quality of Service (QoS) Discrimination and Feature Gating. QoS Discrimination involves subtly degrading the performance – increasing latency or error rates – experienced by rival applications utilizing the same inference infrastructure, effectively making it more expensive for them to deliver a comparable user experience. Feature Gating, conversely, restricts access to crucial functionalities or datasets, compelling competitors to either develop costly alternatives or remain at a distinct disadvantage. These tactics, while potentially subtle, operate as a form of competitive sabotage, hindering innovation and reinforcing the market power of the controlling firm by creating significant barriers to entry and discouraging the development of alternative services.

Within intricate ecosystem markets, where interconnected services and dependencies are the norm, anti-competitive strategies can erect substantial barriers to entry for new businesses and severely hinder innovation. Established firms, by controlling vital components of the ecosystem – such as inference services – are uniquely positioned to disadvantage competitors, not through superior products, but through strategic manipulation of access and functionality. This creates a landscape where sustained competitive pressure is diminished, as potential entrants face not only the challenge of developing comparable technology, but also of overcoming artificially imposed hurdles designed to limit their reach and viability. The result is a stifling of dynamism, where the dominant players maintain their position not through innovation, but through the skillful exercise of market power, ultimately slowing the pace of progress and limiting consumer choice.

Platform economics reveals how dominance over foundational infrastructure can be strategically weaponized to disadvantage competitors. Unlike traditional markets, platform-based industries exhibit strong network effects and high switching costs, meaning control over essential components – such as inference services – grants significant leverage. A firm controlling this critical infrastructure isn’t simply competing on merit; it can actively raise the costs for rivals, limit their access to essential resources, or even selectively degrade their performance. This creates an uneven playing field, hindering innovation and ultimately reducing consumer choice as new entrants struggle to gain traction against an incumbent with the power to dictate terms. The result is a concentration of power, where competitive advantage stems not from superior products or services, but from control over the underlying platform itself.

Analysis of inference service performance reveals that sustained discrepancies in response times – specifically, a difference exceeding 15% in the 95th and 99th percentile latency, as quantified by the QoS Wedge Threshold – raises significant concerns about potential QoS Discrimination. This metric serves as an early warning signal, indicating that a dominant firm may be deliberately slowing down or denying service to competitors’ applications. Such practices effectively ‘raise rivals’ costs’ by degrading the user experience for those utilizing alternative services, even if those services are technically viable. The identification of these performance deltas relies on continuous monitoring of key metrics, enabling researchers and regulators to detect and address anti-competitive behavior within complex ecosystem markets before substantial harm occurs to innovation and consumer choice.

A Path Towards Sustainable Innovation

The implementation of Fair, Reasonable, and Non-Discriminatory (FRAND) licensing for access to inference services represents a pivotal step towards fostering a dynamic and competitive artificial intelligence landscape. By establishing clear and equitable terms for utilizing these powerful tools, FRAND licensing prevents any single entity from monopolizing access and stifling innovation. This approach encourages a wider range of developers, researchers, and businesses to participate in the creation and deployment of AI applications, ultimately accelerating the pace of progress. Rather than restricting access through prohibitive costs or discriminatory practices, FRAND licensing unlocks opportunities for smaller players and independent innovators, leveling the playing field and ensuring that the benefits of inference technology are broadly distributed. The result is a more resilient and vibrant ecosystem, driven by diverse contributions and healthy competition, rather than concentrated control.

An open and equitable ecosystem for inference services is cultivated by minimizing barriers to entry and promoting broad participation. When access isn’t controlled by a single entity, the risk of market foreclosure – where competitors are effectively shut out – diminishes significantly. This broadened access isn’t simply about allowing more companies to use inference; it encourages a diverse range of developers, researchers, and entrepreneurs to contribute to its advancement. Such participation fuels innovation by bringing forth a wider spectrum of ideas, applications, and improvements, creating a positive feedback loop where increased competition drives down costs and enhances the quality of inference services for everyone. The result is a more resilient and dynamic landscape, less susceptible to stagnation and better positioned to meet evolving needs.

A truly sustainable framework for inference services hinges not only on Fair, Reasonable, and Non-Discriminatory (FRAND) licensing, but also on the implementation of rigorous auditing and transparency protocols. These mechanisms are crucial for verifying adherence to FRAND terms, preventing exploitative practices, and fostering trust within the ecosystem. Regular audits can assess the fairness of pricing, the non-discriminatory nature of access, and the overall reasonableness of licensing conditions. Simultaneously, increased transparency-regarding licensing terms, usage data (where privacy-preserving), and the auditing process itself-empowers stakeholders to identify and address potential imbalances. By combining FRAND principles with these accountability measures, a durable and equitable system for accessing and utilizing inference services becomes achievable, encouraging ongoing investment and broad participation.

The continued advancement of inference technologies hinges on fostering an environment where innovation isn’t stifled by restricted access or unfair practices. A dedication to open access principles, coupled with robust fair competition, dismantles barriers to entry, enabling a diverse range of researchers and developers to contribute to the field. This broadened participation accelerates the pace of discovery, allowing for novel applications and refinements to emerge that might otherwise remain unexplored. By prioritizing inclusivity and equitable access, the full potential of inference can be realized, moving beyond limited implementations to a landscape of widespread benefit and transformative progress across numerous disciplines. Ultimately, a commitment to these principles ensures inference remains a dynamic engine for future innovation, rather than a tool concentrated within a select few.

The pursuit of ‘Neutral Inference’ as detailed in the paper necessitates a rigorous distillation of complex systems. It echoes Donald Knuth’s observation: “Premature optimization is the root of all evil.” The paper doesn’t advocate for simply adding layers of auditability, but rather for a foundational clarity in the design of cognitive infrastructure. The core argument-that control over inference represents a competitive bottleneck-demands a stripping away of unnecessary complexity to reveal the underlying mechanics of AI model access and usage. Only through such reduction can true routing transparency and non-price discrimination be effectively addressed, ensuring fair access and fostering competition within this emerging landscape.

The Road Ahead

The proposition of ‘Neutral Inference’ addresses a symptom, not the disease. Control over inference engines is merely the current manifestation of a perennial problem: asymmetric information and the leveraging of structural position. The paper correctly identifies the potential for non-price discrimination, but fails to fully grapple with the inevitable opacity of complex systems. Auditable standards, while valuable, presuppose a level of observability that is increasingly illusory, even as models grow more intricate and data flows more diffuse.

Future work must shift focus from the mechanics of inference to the underlying architecture of cognitive infrastructure itself. The question isn’t simply how inference is performed, but who controls the essential building blocks – the datasets, the compute, and the algorithmic primitives. A more fruitful line of inquiry lies in examining mechanisms for distributed cognition, where inference is not centralized within a few dominant entities, but rather fragmented and democratized across a network.

The pursuit of ‘Neutral Inference’ is, at best, a temporary reprieve. The fundamental challenge remains: how to reconcile the benefits of scale with the imperatives of competition in a world where intelligence is increasingly commoditized. Emotion is, after all, a side effect of structure; and clarity, a necessary condition for any meaningful progress.

Original article: https://arxiv.org/pdf/2602.22750.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/