East Meets West: Predicting China’s Markets with U.S. Data

Author: Denis Avetisyan


New research reveals that U.S. market signals offer stronger predictive power for Chinese stock returns, and demonstrates a novel approach to cross-market forecasting.

Sector-level Sharpe Ratios demonstrate the potential for cross-market forecasting, specifically leveraging U.S. photovoltaic corporate leverage loan (pvCLCL) returns to predict Chinese operational corporate leverage loan (OPCL) returns-a strategy that reveals opportunities beyond domestic market analysis.
Sector-level Sharpe Ratios demonstrate the potential for cross-market forecasting, specifically leveraging U.S. photovoltaic corporate leverage loan (pvCLCL) returns to predict Chinese operational corporate leverage loan (OPCL) returns-a strategy that reveals opportunities beyond domestic market analysis.

A bipartite graph framework combined with machine learning improves the accuracy of predicting Chinese stock returns using U.S. market information.

Despite increasing global market integration, identifying reliable cross-market predictive relationships remains a persistent challenge in financial forecasting. This study, ‘A Bipartite Graph Approach to U.S.-China Cross-Market Return Forecasting’, addresses this by constructing a directed bipartite graph to model time-ordered predictive linkages between U.S. and Chinese equities. The analysis reveals a pronounced asymmetry, with U.S. market information substantially more informative for forecasting Chinese intraday returns than vice versa. Can this structured machine learning framework, leveraging economically interpretable graph representations, unlock further insights into asymmetric cross-market dependencies and improve global portfolio strategies?


Decoding Market Whispers: The Illusion of Predictability

Predicting stock market behavior remains a formidable challenge, largely because financial time series are fundamentally non-stationary – their statistical properties, such as mean and variance, shift over time, rendering historical data a less reliable guide to future performance. This instability is compounded by the intricate web of dependencies within and between markets; stock prices aren’t simply determined by a company’s intrinsic value, but are influenced by a multitude of interacting factors – economic indicators, investor sentiment, geopolitical events, and the actions of other market participants. These complex, often non-linear relationships mean that even sophisticated models struggle to accurately capture the dynamic interplay driving price fluctuations, frequently leading to forecast errors and highlighting the limitations of relying solely on past data to anticipate future trends.

Conventional financial forecasting models frequently operate under the assumption of relative isolation, treating individual markets as largely independent entities; however, this approach overlooks the pervasive influence of global interconnectedness. These models, often calibrated using historical data from a single market, struggle to account for shocks originating elsewhere and the complex transmission mechanisms that propagate them across borders. Consequently, predictions generated by such systems can be significantly impaired, especially during periods of heightened global volatility or when faced with unforeseen events in distant economies. The inability to effectively incorporate cross-market relationships leads to an underestimation of risk and a systematic bias in forecasts, limiting their practical utility for investors and policymakers alike.

Financial markets are rarely isolated entities; instead, they function as a complex, globally interwoven system where events in one market inevitably ripple outwards, influencing others. Consequently, achieving robust predictive power requires a deliberate shift in focus towards understanding these cross-market spillovers – the transmission of shocks, trends, and information across international borders. Traditional models, often built on the assumption of localized dynamics, frequently underestimate the significance of these interdependencies, leading to inaccurate forecasts. Research increasingly demonstrates that incorporating data on cross-market correlations, volatility contagion, and information flow significantly enhances the ability to anticipate market movements, suggesting that a holistic, globally-integrated approach is essential for navigating the complexities of modern finance and improving predictive accuracy.

A directed bipartite graph reveals significant predictive relationships between source and target market stocks.
A directed bipartite graph reveals significant predictive relationships between source and target market stocks.

Mapping the Interconnected Web: A Bipartite Framework

A bipartite graph is utilized to represent the relationships between U.S. and Chinese stock markets, with one node set comprising U.S. stocks and the other representing Chinese stocks. Edges connect nodes between the two sets, weighted by a correlation coefficient calculated from historical return data. This structure allows for the modeling of cross-market dependencies; the weight of an edge indicates the strength and direction of the linear relationship between the returns of a specific U.S. stock and a specific Chinese stock. By representing these inter-market connections explicitly, the framework enables the identification of U.S. stocks with predictive power for Chinese market movements, forming the basis for a cross-market forecasting model.

Traditional directed graphs represent relationships with a defined direction, such as stock A influencing stock B. This framework builds upon this by explicitly modeling connections between the U.S. and Chinese stock markets as distinct node sets within a bipartite graph. This allows for the identification of predictive signals originating from U.S. stocks that correlate with subsequent returns in the Chinese market, and vice-versa. By representing these inter-market relationships as edges connecting the two node sets, the model can quantify the strength and direction of influence, enabling a more nuanced understanding of cross-market dependencies than is possible with solely intra-market analysis. The bipartite structure facilitates the application of graph-based algorithms to isolate U.S. stocks with the highest predictive power for Chinese market movements.

A rolling window screening technique is utilized to identify U.S. stocks exhibiting the strongest predictive correlation with Chinese market returns over a defined historical period. This involves iteratively calculating correlation coefficients between the daily returns of U.S. stocks and a Chinese market index, using a window of n days. The window slides forward one day at a time, recalculating the correlations and selecting the k U.S. stocks with the highest correlation values for each window position. This dynamic selection process allows the model to adapt to changing market conditions and focus on the U.S. stocks that are currently most indicative of future Chinese market movements, improving forecast accuracy compared to static stock selections. The parameters n and k are determined through optimization on a held-out validation set.

A heatmap reveals statistically significant predictive relationships between Chinese and U.S. stocks-grouped by sector-where colour intensity indicates the strength and direction of influence as measured by the <span class="katex-eq" data-katex-display="false">t</span>-statistic from a rolling-window regression.
A heatmap reveals statistically significant predictive relationships between Chinese and U.S. stocks-grouped by sector-where colour intensity indicates the strength and direction of influence as measured by the t-statistic from a rolling-window regression.

Unveiling Predictive Power: Machine Learning in a Connected World

A suite of machine learning models was implemented to forecast open-to-close returns (OPCLReturns). These included linear models such as Ordinary Least Squares (OLS), LASSO, and Ridge Regression, alongside more complex algorithms like Support Vector Machines (SVM), Gradient Boosting Machines (XGBoost and LGBM), and ensemble methods including Random Forest (RF) and AdaBoost. The implementation of this diverse set of models allowed for a comparative analysis of predictive performance and identification of the most effective techniques for forecasting short-term price movements. Each model was trained and validated using historical data to optimize its parameters and assess its generalization ability.

Analysis reveals that previous close-to-close returns (pvCLCLReturns) consistently function as significant predictors of future open-to-close returns. This finding underscores the inherent value of historical price data in forecasting market movements. Specifically, the models implemented demonstrate a strong correlation between past close-to-close performance and subsequent returns, indicating a degree of persistence in market behavior. The predictive power derived from pvCLCLReturns is not isolated; it contributes to Sharpe Ratios exceeding 1.0 across multiple quantiles when integrated within the graph-based framework, confirming the quantifiable benefit of incorporating historical data into the predictive models.

The implementation of machine learning models, leveraging U.S. returns within a graph-based framework, consistently generates Sharpe Ratios exceeding 1.0 across the majority of performance quantiles. This metric indicates a substantial enhancement in risk-adjusted returns compared to benchmarks lacking this integrated data. Specifically, Sharpe Ratios above 1.0 suggest that the models are delivering returns that compensate investors adequately for the level of risk taken, and consistent achievement across multiple quantiles demonstrates the robustness and reliability of the predictive improvement gained through the incorporation of U.S. return data.

Model performance is significantly influenced by the configuration of the bipartite graph used to represent inter-market relationships. The structure of this graph, defining connections between assets across different markets, dictates how information flows and is utilized by the machine learning algorithms. Specifically, the density and pattern of connections within the bipartite graph affect the models’ ability to capture complex dependencies and propagate predictive signals. Variations in graph structure, reflecting different assumptions about market interconnectedness, directly translate to measurable differences in Sharpe Ratios and overall predictive accuracy, indicating that the representation of inter-market relationships is a critical component of model efficacy.

Using U.S. pvCLCL returns to forecast Chinese OPCL returns, machine learning models generate cumulative daily profit and loss (PnL) curves representing nested quantile portfolios ranked by the absolute value of predicted returns.
Using U.S. pvCLCL returns to forecast Chinese OPCL returns, machine learning models generate cumulative daily profit and loss (PnL) curves representing nested quantile portfolios ranked by the absolute value of predicted returns.

Deciphering the Signals: Sectoral Insights and the Value of Connection

The efficacy of the predictive model is determined through rigorous evaluation using the Sharpe Ratio, a widely accepted metric in finance that quantifies risk-adjusted returns. This ratio assesses the excess return earned per unit of risk taken, allowing for a standardized comparison of investment performance, irrespective of varying levels of volatility. A higher Sharpe Ratio indicates superior performance, as it suggests a greater return for the level of risk assumed. By employing the Sharpe Ratio, the model’s ability to generate profits relative to its inherent risks can be objectively measured, providing crucial insights into its practical applicability and potential for successful implementation within investment strategies. \text{Sharpe Ratio} = \frac{R_p - R_f}{\sigma_p} , where R_p is the portfolio return, R_f is the risk-free rate, and \sigma_p represents the portfolio’s standard deviation.

Rigorous evaluation of the model demonstrates a noteworthy correlation between market volatility and predictive accuracy, consistently achieving Sharpe Ratios above 1.0 within the highest performance quantiles – specifically the top 40% to 60% of results (qr4-qr6). This indicates that as U.S. market activity intensifies, and fluctuations increase, the model’s capacity to generate risk-adjusted returns also strengthens. The consistent outperformance during periods of heightened volatility suggests the model effectively captures and leverages the increased informational content present in dynamic market conditions, offering a potentially valuable tool for investors seeking to capitalize on, or mitigate the impact of, market swings. This heightened predictive power under stress underscores the model’s robustness and its ability to deliver consistent results beyond calmer market phases.

Analysis of interconnected markets, visualized through a bipartite graph, illuminates crucial transmission channels where information propagates and influences asset pricing. This approach moves beyond isolated market assessments, revealing how signals originating in one sector can cascade into others, creating predictive relationships. The structure of the bipartite graph-connecting assets across different markets-allows researchers to trace these informational flows, identifying which sectors act as key conduits and which exhibit heightened sensitivity to external influences. By mapping these connections, it becomes possible to understand not just what predicts market movement, but how information travels to create those predictive patterns, offering a more nuanced and potentially profitable understanding of market dynamics.

Analysis reveals that predictive signals originating in one market exhibit particularly strong effects within the technology and consumer defensive sectors. These segments consistently demonstrate Sharpe Ratios exceeding 1.0, a benchmark indicating superior risk-adjusted returns, when leveraging cross-market information. This suggests that patterns observed in one market can be reliably used to forecast performance in these specific sectors, potentially due to shared underlying economic drivers or investor behaviors. The strength of these predictive links highlights opportunities for strategic portfolio construction and refined forecasting models focused on these key areas of the market, enabling investors to capitalize on inter-market dynamics.

The study’s results suggest that leveraging relationships between markets-cross-market analysis-holds significant promise for refining predictive models and bolstering investment approaches. By examining the interconnectedness of diverse sectors, researchers identified transmission channels that amplify forecasting accuracy, particularly during periods of increased market volatility. This inter-market approach consistently yielded Sharpe Ratios exceeding 1.0 in key sectors like technology and consumer defensive goods, indicating a substantial improvement in risk-adjusted returns. Consequently, these findings support the development of more robust investment strategies capable of capitalizing on subtle, yet crucial, information flows between markets and potentially generating superior performance.

The heatmap displays the sector-level predictive strength between Chinese and U.S. sectors, revealing the median absolute t-statistic derived from the time-averaged biadjacency matrix of their cross-market relationships.
The heatmap displays the sector-level predictive strength between Chinese and U.S. sectors, revealing the median absolute t-statistic derived from the time-averaged biadjacency matrix of their cross-market relationships.

The study’s exploration of directed bipartite graphs to enhance cross-market return forecasting aligns with a fundamental principle of understanding any complex system: deconstruction. It isn’t enough to simply observe the connections between U.S. and Chinese markets; one must actively dissect them to reveal the inherent asymmetries, as demonstrated by the finding that U.S. market information proves more predictive of Chinese returns. As Donald Davies once stated, “If you can’t break it, you don’t understand it.” This isn’t advocating for malicious disruption, but rather a rigorous, analytical approach-a deliberate ‘breaking down’ of the system into its constituent parts-to truly grasp the directional flow of information and improve predictive modeling. The research exemplifies this by effectively ‘breaking down’ the market relationship into a directed graph, revealing the underlying structure and enabling more accurate forecasting.

Beyond Prediction: Unraveling the Signal

The demonstrated asymmetry – the U.S. market’s predictive power over China exceeding the reverse – isn’t merely a statistical observation. It begs a deeper question: is this an inherent characteristic of the systems themselves, or a temporary condition sculpted by specific market phases? The model successfully maps relationships, but understanding why those relationships exist remains a challenge. Future work should move beyond correlation to explore the underlying causal mechanisms, perhaps by integrating macroeconomic indicators or sentiment analysis into the bipartite graph framework. The current approach treats the graph as a static representation; however, markets are dynamic systems. Allowing the graph structure itself to evolve over time, adapting to shifting dependencies, could unlock further predictive gains.

The reliance on return forecasting, while practical, also represents a limitation. It assumes markets should be predictable, a premise worth questioning. Perhaps the true value of this bipartite graph approach lies not in predicting the next tick, but in identifying systemic vulnerabilities – the points of leverage where information flow dictates stability, or instigates cascades. This necessitates expanding the model beyond financial instruments to encompass broader economic and geopolitical data, effectively turning the graph into a stress-test for interconnected systems.

Ultimately, this work opens a path toward reverse-engineering market behavior. The graph isn’t the territory; it’s a map constructed from observed data. The next step isn’t refining the map, but questioning the landscape itself – dismantling assumptions about market efficiency and exploring the inherent chaos that underpins it all.


Original article: https://arxiv.org/pdf/2603.10559.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-12 06:30