Following the Money: AI-Powered Paths to Smarter Venture Capital

Author: Denis Avetisyan


New research demonstrates how combining graph-based knowledge with large language models can significantly improve the accuracy of venture capital investment predictions.

The path selector operates by systematically evaluating the graph to identify and retrieve the optimal trajectory, effectively navigating a complex network to pinpoint the most efficient route.
The path selector operates by systematically evaluating the graph to identify and retrieve the optimal trajectory, effectively navigating a complex network to pinpoint the most efficient route.

This study introduces MIRAGE-VC, a framework leveraging information gain and adaptive evidence weighting within a retrieval-augmented generation system for enhanced investment analysis.

Predicting venture capital success remains a challenge despite the wealth of relational data available, as traditional machine learning struggles with complex reasoning. This is addressed in ‘The Gaining Paths to Investment Success: Information-Driven LLM Graph Reasoning for Venture Capital Prediction’, which introduces MIRAGE-VC, a novel framework combining graph retrieval with large language models to distill investment networks into focused, interpretable chains. By adaptively weighting diverse evidence and prioritizing information gain, MIRAGE-VC significantly improves prediction accuracy-but could this approach unlock similar gains in other off-graph prediction tasks like recommendation systems or risk assessment?


Deconstructing the VC Black Box: Beyond Gut Feelings

For decades, venture capital firms have navigated investment landscapes guided by experienced partners and gut feelings, a reliance stemming from historically limited access to comprehensive data. This approach, while valuing human judgment, often overlooks subtle yet critical signals embedded within the broader investment ecosystem. Crucially, traditional methods struggle to quantify the influence of network effects – the ways in which connections between companies, investors, and even previous funding rounds impact future success. Consequently, potentially valuable opportunities are frequently missed, and risk assessment remains hampered by an incomplete understanding of the intricate dynamics at play within the venture capital world. The challenge lies not in dismissing expertise, but in augmenting it with data-driven insights that can reveal patterns previously obscured by the limitations of conventional analysis.

The venture capital landscape isn’t simply a collection of individual investments; it’s a deeply interconnected network where relationships between companies, investors, and subsequent funding rounds hold significant, yet largely unexamined, predictive power. Each investment isn’t isolated; it’s influenced by prior connections, shared investors, and patterns of capital flow. This intricate web creates a system where information about a company’s potential isn’t solely contained within its pitch deck or financial statements, but also resides in the affiliations and investment histories of those involved. Analyzing these relationships – identifying influential investors, common co-investors, and the pathways capital takes – offers a novel approach to assessing risk and predicting future success, potentially revealing hidden opportunities and mitigating losses beyond what traditional due diligence can achieve. It suggests that understanding who invests, is as crucial as understanding what they invest in.

Predicting venture capital success isn’t simply about tallying investment amounts or counting connections; it demands a sophisticated analysis of how capital flows through the ecosystem. Traditional methods treat networks as collections of nodes, overlooking the critical influence of investment pathways – the specific sequences of funding that signal promising ventures. A nuanced understanding reveals that companies backed by investors with a history of successful follow-on funding, or those connected to firms specializing in particular sectors, exhibit significantly higher probabilities of future success. Therefore, advanced analytical techniques, such as graph embeddings and path-based feature engineering, are essential to decode the hidden value within these complex relationships, going beyond superficial data aggregation to identify the subtle signals that distinguish thriving startups from those destined to falter.

MIRAGE-VC employs a framework integrating graph and text retrieval, multi-agent analysis, and dynamic weighted fusion to adaptively combine information for comprehensive analysis.
MIRAGE-VC employs a framework integrating graph and text retrieval, multi-agent analysis, and dynamic weighted fusion to adaptively combine information for comprehensive analysis.

MIRAGE-VC: A System for Dissecting the Investment Landscape

MIRAGE-VC is a retrieval-augmented generation (RAG) framework developed to assess the likelihood of Series A funding success for venture-backed companies. The system operates by retrieving relevant information from a network representing the venture capital investment landscape, and then utilizing this data to inform a generative model. This approach allows MIRAGE-VC to move beyond simple feature-based predictions, incorporating contextual information about a company’s peers, investor profiles, and investment history to provide a more nuanced evaluation of funding potential. The framework’s core innovation lies in its ability to synthesize both structured, graph-based data regarding investor relationships and unstructured textual data from sources like news articles and company descriptions.

MIRAGE-VC utilizes a dual-input data strategy, combining graph-structured data representing the venture capital investment network with unstructured textual data from company descriptions and investor profiles. The graph data encodes relationships between companies, investors, and funding rounds, enabling the identification of influential network connections and investment patterns. Concurrently, textual data provides detailed information regarding company business models, market positioning, and investor specializations. This integration allows MIRAGE-VC to move beyond simple feature-based predictions and capture nuanced contextual information relevant to Series A funding success, resulting in a more holistic assessment of investment opportunities.

MIRAGE-VC employs three distinct agent types to facilitate multi-perspective analysis of venture capital investment opportunities. The Peer-Company agent identifies comparable companies based on industry, stage, and other relevant characteristics to establish benchmarks and assess the target company’s relative positioning. The Investor Profile agent analyzes the investment history, preferences, and portfolio composition of potential investors to determine alignment with the target company. Finally, the Investment Chain agent traces the flow of capital through prior investment rounds and identifies key investors and signaling effects within the venture capital network. Each agent independently retrieves and analyzes evidence, contributing to a holistic assessment of Series A funding success probability.

Agent performance, measured by F1 score, improves with a larger number of similar companies and resume breadth for text retrieval, and with increased search depth and path count for graph retrieval.
Agent performance, measured by F1 score, improves with a larger number of similar companies and resume breadth for text retrieval, and with increased search depth and path count for graph retrieval.

Mapping the Capital Flow: Uncovering Hidden Investment Chains

The identification of high-value Investment Chains within the VC Investment Network is achieved through an iterative path retriever. This retriever operates by commencing at a target company and systematically expanding its search to neighboring nodes. Expansion is not random; instead, the retriever prioritizes neighbors based on their potential to improve the accuracy of a Language Model (LLM) predictor. This improvement is quantified as “information gain,” effectively meaning the retriever selects paths that yield the most predictive power for the LLM. The process repeats iteratively, building Investment Chains composed of companies that collectively maximize the LLM’s predictive performance, rather than simply prioritizing chain length or network centrality.

The path retriever operates by iteratively expanding from an initial target company within the VC Investment Network. At each step, potential neighboring companies are evaluated based on their contribution to the accuracy of the underlying Large Language Model (LLM) predictor. Specifically, the retriever selects the neighbor that yields the greatest measurable improvement in the LLM’s predictive performance, as determined by a defined accuracy metric. This process prioritizes information gain; nodes are not simply added based on network proximity, but rather on their capacity to enhance the LLM’s ability to make correct predictions regarding the target company or related investment outcomes. This selection strategy continues until a pre-defined path length is reached or further expansion fails to demonstrably improve prediction accuracy.

The prediction model benefits from data provided by both Graph Retrieval and Text Retrieval modules operating in parallel. Graph Retrieval identifies relevant entities and relationships within the Venture Capital (VC) Investment Network, supplying structural information about connections between companies, investors, and funding rounds. Simultaneously, Text Retrieval accesses and processes textual data – including news articles, company descriptions, and investment memos – to provide semantic context and details not captured in the graph structure. These two modules operate independently, delivering complementary information sets that, when combined, provide a more comprehensive understanding of the investment landscape than either could achieve alone.

The Learnable Gating Network functions as a weighted aggregation mechanism for the outputs of the Graph Retrieval and Text Retrieval modules. This network employs learned weights to dynamically prioritize information from each source, optimizing the overall prediction accuracy. Specifically, the gating network receives the verdicts – or prediction scores – from both retrieval pathways and calculates a weighted sum, where the weights are determined through training. This allows the model to adaptively emphasize the more relevant information stream – either structural insights from the graph or semantic details from the text – depending on the specific target company and the characteristics of the Investment Chain being evaluated. The learned weights are adjusted during training to minimize prediction error, effectively learning which information source is most reliable and informative in different contexts.

The distribution of maximum hop lengths used by the path retriever differs significantly between successful and unsuccessful predictions.
The distribution of maximum hop lengths used by the path retriever differs significantly between successful and unsuccessful predictions.

Validating the System: Performance and Real-World Impact

Evaluations demonstrate that MIRAGE-VC consistently surpasses existing venture capital prediction models, establishing new benchmarks in performance. Specifically, the framework achieves a +5.0% improvement in the F1 Score – a measure of predictive accuracy balancing precision and recall – and a substantial +16.6% increase in Precision@5, indicating a significantly enhanced ability to identify the most promising startups within a ranked list. These gains aren’t simply incremental; they represent a considerable leap forward in the field, suggesting that MIRAGE-VC offers a more reliable and effective method for assessing investment potential and ultimately, optimizing venture capital outcomes.

The capacity of MIRAGE-VC to pinpoint high-potential startups with increased accuracy represents a substantial advancement in venture capital investment strategies. By more effectively discerning promising ventures, the framework facilitates optimized resource allocation, potentially leading to significantly improved returns on investment. This refined selection process minimizes the risk associated with early-stage funding, allowing investors to concentrate on opportunities with a higher probability of success and substantial growth. Consequently, MIRAGE-VC not only streamlines the due diligence process but also has the potential to reshape the landscape of venture capital, fostering innovation and driving economic development through more informed investment decisions.

Rigorous evaluation reveals that MIRAGE-VC significantly surpasses random prediction benchmarks across multiple key metrics. Specifically, the framework achieves a +16.6% relative improvement in Average Precision at k=5 (AP@5), indicating a substantial increase in the ranking of truly promising startups within generated lists. Gains extend to measures of initial ranking accuracy, with a +0.0879 improvement in Hit@1 – the probability the top-ranked startup is relevant – and a +0.1851 increase in Normalized Discounted Cumulative Gain at k=1 (NDCG@1), demonstrating enhanced ability to prioritize the most valuable opportunities early in the assessment process. These results collectively suggest that MIRAGE-VC doesn’t just identify relevant startups, but effectively elevates their position within the predicted rankings, offering a powerful advantage for venture capital firms.

MIRAGE-VC addresses the complexities of venture capital prediction through an innovative combination of graph-based reasoning and multi-agent analysis. This framework moves beyond traditional methods by representing the startup ecosystem as a dynamic graph, where startups, investors, and advisors are interconnected nodes. By analyzing relationships and influence within this network, the system identifies promising ventures with greater accuracy. The multi-agent component simulates the decision-making processes of various investors, allowing for a more nuanced and realistic assessment of a startup’s potential. This combined approach not only enhances predictive performance but also provides valuable insights into the underlying factors driving investment success, ultimately offering a robust and effective solution to a notoriously challenging prediction problem.

The pursuit within MIRAGE-VC, to discern predictive investment chains, echoes a fundamental tenet of understanding any complex system: reverse engineering its logic. It’s akin to tracing the execution of code to uncover hidden functionality. As John McCarthy famously stated, “Every worthwhile thing is, at bottom, a problem in formal logic.” This framework doesn’t merely accept data; it actively seeks the underlying reasoning, the paths of information gain that connect initial investments to ultimate success. The adaptive weighting of evidence isn’t about finding the answer, but rather understanding how the system arrives at its conclusions – essentially, reading the source code of venture capital itself.

Beyond the Horizon

The framework presented isn’t about predicting the future, a demonstrably foolish exercise. It’s about systematically deconstructing the narratives that already exist around investment, exposing the underlying assumptions, and then stress-testing them. MIRAGE-VC isn’t a crystal ball; it’s a controlled demolition of conventional wisdom. The immediate challenge lies in quantifying the “relevance” of those investment chains – the current reliance on heuristic weighting feels… provisional. True understanding will require a more fundamental metric, perhaps rooted in information-theoretic limits of the data itself.

Further refinement demands a willingness to embrace failure. The system, as presented, prioritizes evidence corroboration. But genuine insight often emerges from anomaly detection – from identifying the chains that shouldn’t work, yet somehow do. Building a mechanism to actively seek out and analyze these contradictions-to deliberately break the expected patterns-will be crucial. It’s not about building a more accurate predictor; it’s about creating a system that’s maximally sensitive to the unexpected.

Ultimately, the most interesting path forward isn’t better prediction, but better diagnostics. Can this approach be inverted to identify the source of investment bubbles, or to pinpoint the critical vulnerabilities within a portfolio? The true value lies not in knowing what will happen, but in understanding why things happen the way they do-a reverse-engineering of capital itself.


Original article: https://arxiv.org/pdf/2512.23489.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-31 20:43