Unlocking E-Commerce Data with AI Agents

Author: Denis Avetisyan

A new system leverages the power of large language models to deliver personalized and actionable insights for online sellers.

A tiered intelligence architecture distributes agency, with a managing agent coordinating the functions of two subordinate agents - one dedicated to data presentation and the other to insight generation - anticipating eventual systemic brittleness inherent in any hierarchical design. — A tiered intelligence architecture distributes agency, with a managing agent coordinating the functions of two subordinate agents – one dedicated to data presentation and the other to insight generation – anticipating eventual systemic brittleness inherent in any hierarchical design.

This paper introduces Insight Agents, a hierarchical multi-agent system using LLMs and RAG for high-accuracy, low-latency data analysis in e-commerce.

E-commerce sellers often struggle to leverage the wealth of data available to them, hindered by complex tools and inefficient information retrieval. This paper introduces ‘Insight Agents: An LLM-Based Multi-Agent System for Data Insights’, a novel hierarchical multi-agent system designed to address this challenge by delivering personalized, data-driven insights. The system achieves high accuracy-90% based on human evaluation-and low latency (P90 < 15s) through a plan-and-execute paradigm and strategic integration of domain knowledge. Could such an agentic system represent a scalable solution for empowering e-commerce businesses to make faster, more informed decisions?

The Illusion of Insight in E-Commerce Data

The modern e-commerce landscape generates a vast and ever-increasing volume of data – sales figures, customer demographics, website traffic, marketing campaign results, and more. However, this abundance often proves overwhelming for sellers, who find themselves drowning in information yet starved for actionable insights. Simply possessing the data isn’t enough; the crucial challenge lies in effectively processing, interpreting, and applying it to improve business performance. Many sellers lack the specialized tools or expertise to sift through the noise and identify key trends, hindering their ability to make informed decisions regarding inventory management, pricing strategies, or targeted marketing efforts. This disconnect between data availability and practical application represents a significant obstacle to growth and competitiveness in the rapidly evolving e-commerce sector, pushing the need for more sophisticated analytical solutions.

Current e-commerce data analysis frequently relies on pre-defined reports and dashboards, proving inadequate for sellers seeking answers to nuanced or specific questions. These conventional approaches often deliver generalized insights, lagging behind the rapid shifts in consumer behavior and market trends. Consequently, sellers frequently encounter delays in obtaining the precise information needed to optimize listings, adjust pricing, or address emerging issues, hindering their ability to react swiftly to competitive pressures. The limitations of these traditional methods stem from their inability to interpret the context of a seller’s inquiry and pinpoint relevant data within vast, complex datasets, resulting in frustratingly generic or untimely responses that fail to drive meaningful improvements in performance.

The sheer volume of data generated by modern e-commerce platforms presents a significant hurdle for sellers seeking to optimize their businesses. A crucial requirement, therefore, is the development of systems capable of interpreting nuanced, complex questions-moving beyond simple keyword searches. These systems must not only understand the intent behind a seller’s inquiry, but also efficiently retrieve and synthesize relevant data from disparate sources. Such a capability promises to unlock actionable insights, enabling sellers to quickly address challenges, identify emerging trends, and personalize strategies for enhanced performance. The ability to transform raw data into readily understandable answers represents a fundamental shift from passive data collection to proactive, informed decision-making, and is increasingly vital for maintaining a competitive edge in the dynamic e-commerce landscape.

This architecture integrates a data presenter and insight generator to facilitate comprehensive data analysis and visualization.

Orchestrating Intelligence: A Hierarchical System

Insight Agents utilize a hierarchical multi-agent system architecture to facilitate data insight extraction. This design incorporates Large Language Models (LLMs) as core components, enabling complex data analysis through the delegation of tasks to specialized agents organized in a tiered structure. The hierarchy allows for decomposition of intricate queries into smaller, more manageable sub-tasks, with higher-level agents coordinating the activities of lower-level agents. This approach improves efficiency and scalability in processing large datasets and delivering relevant insights, moving beyond the limitations of single-agent LLM applications when faced with complex data retrieval and analytical challenges.

The Insight Agent system utilizes a Plan-and-Execute paradigm to optimize data retrieval and analysis. This involves an initial planning stage where the agent constructs a specific retrieval strategy – detailing the data sources, filters, and sequence of access – before interacting with seller data. This pre-execution planning allows the system to decompose complex queries into a series of ordered steps, improving the efficiency and accuracy of data acquisition and reducing the likelihood of irrelevant or erroneous results. The planned strategy dictates precisely how seller data is accessed and processed, ensuring a targeted and systematic approach to information retrieval.

The hierarchical structure of Insight Agents facilitates the decomposition of complex queries into a series of discrete, sequential steps. This modular approach allows the system to first define a retrieval plan-specifying data sources and filtering criteria-before executing that plan to access seller data. By breaking down large requests, the system minimizes the potential for errors inherent in processing extensive datasets at once and focuses computational resources on smaller, more readily verifiable sub-tasks. This staged process directly contributes to improved accuracy and relevance of the final responses, as each step can be validated before proceeding to the next.

Routing the Flow: Intelligent Query Access

The Manager Agent functions as the primary control point for all incoming queries. Its initial responsibility involves receiving and parsing user input before directing it to the most suitable worker agent for fulfillment. This orchestration process includes determining the intent of the query and classifying it according to pre-defined categories. By acting as a central router, the Manager Agent streamlines the query handling process, preventing direct access to specialized agents and ensuring efficient resource allocation. This design promotes scalability and maintainability by isolating the complexities of individual agent functionalities behind a unified interface.

The system employs an Agent Router, utilizing a BERT model, to categorize incoming queries and direct them to the appropriate worker agent for processing. This approach achieves 83% accuracy in routing queries to the correct branch, as measured by internal testing. Comparative analysis demonstrates a 23% performance advantage over a standard Large Language Model (LLM)-based classifier performing the same task, indicating improved efficiency and precision in query distribution. The BERT model’s architecture allows for a more nuanced understanding of query intent, contributing to the increased routing accuracy.

The system incorporates Out-of-Domain (OOD) detection to limit responses to queries within its defined knowledge scope. This is achieved using an Auto-encoder, which analyzes incoming queries and identifies those falling outside of the trained data distribution. Implementation with the Auto-encoder results in an OOD detection latency of less than 0.01 seconds. This represents a significant performance improvement compared to latency figures achieved with Large Language Model (LLM)-based OOD detection methods.

Following query routing, worker agents – including the Data Presenter Agent and Insight Generator Agent – utilize the Data Workflow Planner for data retrieval and processing. The Data Workflow Planner functions as an intermediary, constructing and executing a series of operations through the Data API to fulfill the query’s requirements. This modular approach allows for complex data requests to be broken down into manageable steps, ensuring efficient data access and manipulation. The Data API provides a standardized interface for interacting with underlying data sources, abstracting the complexities of data storage and retrieval from the worker agents and enabling scalability and maintainability.

This data workflow planner utilizes a data presenter to illustrate the process.

Reasoning from Data: Generating Actionable Insights

The Insight Generator Agent utilizes In-Context Learning (ICL) and Chain of Thought (CoT) prompting techniques to generate detailed responses. ICL enables the agent to learn from a limited set of provided examples, adapting to specific data characteristics without requiring extensive retraining. CoT prompting involves structuring the agent’s reasoning process by explicitly requesting a step-by-step explanation before delivering a final answer. This methodology improves response accuracy and allows the agent to articulate the logic behind its conclusions, facilitating a clearer understanding of the derived insights for the user. The combination of ICL and CoT results in nuanced outputs that go beyond simple data retrieval, offering a reasoned analysis of the information.

The Insight Generator Agent’s capacity extends beyond simple data retrieval; it provides explanatory reasoning alongside presented information. This is achieved through Chain of Thought prompting, which compels the agent to articulate the steps taken to arrive at a particular insight. Rather than solely presenting a result – such as a decline in sales – the agent details the data points considered, the calculations performed, and the logical connections made to reach that conclusion. This transparency allows sellers to validate the insight, understand its context, and assess its implications for their specific business, fostering greater confidence in data-driven decision-making.

The Data Workflow Executor and Data Workflow Planner function in concert to deliver reliable data to the Insight Generator Agent. The Planner component is responsible for defining the sequence of data retrieval steps, determining the necessary data sources and filters to address a given query. Following this plan, the Executor component efficiently accesses those sources, extracts the requested data, and validates its accuracy before passing it along. This division of labor ensures both the completeness and correctness of the data used in generating insights, minimizing errors and maximizing the efficiency of the overall process.

The integration of advanced reasoning capabilities with efficient data access provides sellers with a functional basis for data-driven decision-making. By not simply presenting data, but also articulating the rationale behind insights, sellers gain a more comprehensive understanding of factors influencing performance. This allows for more informed strategic adjustments, such as optimizing listings, refining pricing strategies, or identifying emerging market trends. The ability to quickly retrieve and interpret relevant data reduces reliance on manual analysis and subjective judgment, leading to faster and more effective responses to changing market conditions and ultimately, improved sales outcomes.

Measuring Success and Charting the Future

Rigorous evaluation of the Insight Agents relies on a suite of quantitative metrics designed to assess the quality of delivered insights. Beyond simple accuracy, the system’s performance is dissected through measures of relevance – determining if the provided information directly addresses the seller’s query – and correctness, which validates the factual accuracy of the response. Crucially, completeness is also assessed, ensuring the agent doesn’t omit critical details needed for informed decision-making. These metrics, taken together, offer a nuanced understanding of the system’s capabilities and pinpoint areas for refinement, ultimately driving improvements in the quality and usefulness of the data-driven insights provided to e-commerce sellers.

The newly developed Insight Agents (IA) system demonstrates a high degree of efficacy in delivering actionable intelligence to e-commerce sellers. Achieving 89.5% question-level accuracy, the system reliably provides relevant and correct answers to seller inquiries, enabling data-driven decision-making. Crucially, this performance is delivered with a P90 latency of under 15 seconds, ensuring a responsive user experience and facilitating timely insights. This combination of accuracy and speed positions the IA system as a valuable tool for optimizing business strategies and enhancing seller performance within dynamic e-commerce environments.

Question-level accuracy represents a critical benchmark for evaluating the efficacy of the Insight Agents system, directly quantifying its ability to deliver precise and reliable answers to e-commerce sellers’ inquiries. This metric assesses whether the system’s responses fully and correctly address the specific question posed, moving beyond simple keyword matching to demonstrate genuine understanding and information retrieval. A high question-level accuracy-as demonstrated by the system’s performance-indicates a robust capability to synthesize data and present actionable insights, fostering trust and enabling sellers to make informed business decisions. Consequently, improvements in this metric directly translate to enhanced user experience and greater value derived from the personalized data provided by the multi-agent system.

The developed Insight Agents system isn’t simply a fixed solution, but rather a scalable and adaptable framework designed to deliver personalized data insights. Its multi-agent architecture allows for the easy integration of new data sources and analytical tools, enabling it to evolve alongside the changing needs of e-commerce sellers. This modular design facilitates horizontal scaling – adding more agents to handle increased query volume – and vertical scaling – enhancing the capabilities of individual agents with more sophisticated reasoning algorithms. Consequently, the system can be readily customized to address specific business challenges and can be expanded to support a wider range of data types and analytical tasks, ensuring long-term value and relevance in a dynamic market.

Continued development of the Insight Agents system prioritizes a more comprehensive understanding of e-commerce data and improved analytical skills. Researchers intend to broaden the system’s knowledge base by incorporating a wider range of data sources, including evolving market trends and competitor analysis. Simultaneously, efforts are directed toward refining the agents’ reasoning capabilities, enabling them to not only retrieve information but also synthesize it into more nuanced and actionable insights for sellers. This includes exploring advanced techniques in natural language processing and machine learning to allow the agents to handle complex queries, identify subtle patterns, and ultimately provide more strategic recommendations beyond simple data reporting.

The pursuit of insight, as demonstrated by Insight Agents, echoes a sentiment held by those who first charted the unknown. Carl Friedrich Gauss observed, “If others would think as hard as I do, they would not have so many questions.” This system, built not as a rigid structure but as a growing ecosystem of agents, accepts that complete answers are asymptotic. The architecture isn’t a solution; it’s a preparation for further inquiry. The core of Insight Agents-the plan-and-execute methodology coupled with robust information retrieval-acknowledges that data’s true value isn’t in its immediate revelation, but in the system’s capacity to continually refine its questions and deepen its understanding. It isn’t about finding insights, but cultivating the conditions for their emergence.

Gardens to Grow

The pursuit of automated insight, as demonstrated by systems like Insight Agents, isn’t about building a perfect machine for knowing. It’s about cultivating a garden. Each agent, each retrieval step, is a plant-some will thrive, others will wither. The system’s architecture isn’t a blueprint, but a prophecy of eventual decay. The current emphasis on accuracy and latency, while practical, risks mistaking the map for the territory. True resilience doesn’t lie in isolating components against failure, but in forgiveness – in the system’s capacity to absorb errors and continue to bloom, even imperfectly.

The limitations inherent in retrieval-augmented generation are not bugs to be fixed, but features of the ecosystem. The ‘truth’ isn’t held within the data, but emerges from the interaction between the agent and its environment. Future work should focus less on optimizing for a static notion of ‘correctness’ and more on fostering adaptability. How does the system learn not just what is true today, but how truth itself changes?

The application to e-commerce is a convenient starting point, but the real challenge lies in scaling beyond narrow domains. A truly intelligent system won’t merely answer questions; it will anticipate them, and even, perhaps, formulate questions its user hasn’t yet considered. This isn’t about building a better tool, but about growing a more thoughtful companion.

Original article: https://arxiv.org/pdf/2601.20048.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/