Beyond the Stats: How Player Networks Impact NBA Salaries

Author: Denis Avetisyan

New research shows that accounting for a player’s relationships and career progression, rather than just on-court performance, can significantly improve salary predictions.

The heterogeneous NBA knowledge graph interlinks player season data with entities like teams, agents, awards, and injuries, and employs a strict admissibility function <span class="katex-eq" data-katex-display="false">A(e,s)</span> to mask temporal edges-such as prior wins or injury history-thereby preventing look-ahead bias during analysis. — The heterogeneous NBA knowledge graph interlinks player season data with entities like teams, agents, awards, and injuries, and employs a strict admissibility function $A(e,s)$ to mask temporal edges-such as prior wins or injury history-thereby preventing look-ahead bias during analysis.

Graph neural networks demonstrate improved valuation of established NBA players by capturing relational capital, though traditional methods suffice for rookies.

Predicting the market value of professional athletes remains a complex challenge, particularly given the dynamic and often unpredictable nature of performance. This is the core issue addressed in ‘The Value of Graph-based Encoding in NBA Salary Prediction’, which investigates the utility of graph neural networks for improved player valuation. The research demonstrates that encoding player relationships and accumulated “relational capital” as a knowledge graph enhances salary prediction models, especially for established veterans where traditional tabular data proves insufficient. But can these graph-based approaches ultimately unlock a more nuanced and accurate understanding of player worth in a league driven by both individual skill and team dynamics?

Beyond Readily Quantifiable Metrics: Understanding Relational Value

Conventional approaches to assessing player value disproportionately emphasize readily quantifiable metrics – explicit metadata like points, rebounds, and assists, along with tabular data detailing game statistics. However, this reliance frequently obscures the intricate web of interactions that define true on-court impact. Players don’t operate in isolation; their effectiveness is deeply interwoven with the quality of their connections – who they pass to, who defends with them, and how these relationships influence team performance. By prioritizing individual statistics over these crucial network effects, current valuation models often fail to recognize the subtle yet significant contributions of players who excel at facilitating play, creating opportunities for others, or providing critical defensive support, ultimately hindering a complete understanding of their overall worth.

Player valuation often fixates on readily quantifiable metrics, yet a player’s impact extends far beyond points scored or tackles made. True worth resides in relational capital – the complex web of connections forged with teammates, opponents, and even referees. This isn’t simply about who a player knows, but the quality of those relationships: the trust, coordination, and synergistic effects they generate on the field. Strong relational capital facilitates seamless teamwork, anticipates opponent strategies, and can even influence crucial in-game decisions. Consequently, a player with robust connections may elevate the performance of an entire team, exceeding what statistical analysis alone would predict, highlighting the limitations of solely relying on explicit data for comprehensive assessment.

Evaluating players entering a league – often termed the ‘cold-start problem’ – presents a unique challenge for predictive models, as traditional methods relying on past performance inherently struggle with limited data. Recent analysis demonstrates that incorporating relational capital – the value derived from a player’s network of connections – significantly improves valuation in these instances, achieving an R-squared of 0.53 when applied to rookie subsets. This indicates that understanding who a player interacts with on the field, and the quality of those interactions, can account for over half of the variance in their projected value, even before substantial statistical data is available. The ability to accurately assess a player’s network effects therefore offers a powerful tool for identifying potential talent and making informed decisions regarding acquisitions and development, circumventing the limitations of purely performance-based evaluations.

Conventional player evaluation systems often fail to distinguish between a seasoned veteran and a promising rookie exhibiting comparable statistics, a limitation stemming from an inability to quantify structural maturity within the team’s network. Research indicates that a player’s value isn’t merely a function of individual performance, but also how effectively they leverage their experience within established relational structures. Graph-based modeling reveals a significant correlation – as measured by a Cliff’s Delta of 0.38 – between a veteran player’s age and their capacity to positively influence critical in-game ‘rescue’ scenarios, suggesting older players possess a nuanced understanding of team dynamics and can anticipate or mitigate negative outcomes with greater proficiency than their less experienced counterparts. This highlights the importance of incorporating network-based metrics to accurately assess a player’s true contribution, particularly when comparing individuals across different career stages.

Modeling Player Networks: A Graph-Based Solution

A heterogeneous graph is proposed as a means of representing the player network, encompassing players, teams, and agents as distinct node types connected by various edge types. This approach moves beyond traditional statistical analysis by explicitly modeling relationships such as player-team affiliations, agent-player representation, and team-team competitive history. By representing these entities and their connections as a graph, the model captures a richer set of contextual information than is possible through isolated player or team statistics. The heterogeneity of the graph-different node and edge types-allows for nuanced analysis of how relationships influence player value and team performance, facilitating a more comprehensive understanding of the complex interactions within the network.

The defined network topology, representing players, teams, and agents as nodes and their relationships as edges, facilitates the application of Graph Neural Networks (GNNs) for value inference. GNNs operate directly on the graph structure, allowing them to aggregate information from a node’s neighbors to learn node embeddings that capture both individual attributes and relational context. Unlike traditional machine learning methods requiring feature engineering, GNNs automatically learn relevant features from the network. This approach enables the model to identify players whose value is not solely determined by their individual statistics, but also by their connections to other valuable players or influential agents, effectively quantifying the impact of network effects on player assessment. The learned embeddings can then be utilized as inputs to downstream tasks such as player valuation, performance prediction, and team building optimization.

Static graph embedding methods, including Node2Vec and RotatE, generate vector representations of nodes within a network based solely on the graph’s structure. Node2Vec utilizes biased random walks to learn node embeddings that preserve neighborhood similarities, effectively capturing homophily within the network. RotatE, conversely, models relations as rotations in a complex vector space, allowing for the representation of asymmetric relationships and knowledge graph completion. Both methods output fixed-length vectors for each player, team, and agent, encoding inherent network properties such as centrality, connectivity, and role within the player network, and providing a foundational input for downstream machine learning tasks. These embeddings are generated through offline processes, meaning they do not adapt to changes in the network over time.

Traditional static graph embedding methods, while effective at capturing inherent network properties, are limited in their ability to represent the evolving relationships between players, teams, and agents over time. Player interactions, such as trades, free agent signings, and on-field performance, constantly reshape the network structure. Consequently, embeddings generated from a static snapshot fail to reflect these dynamic changes, potentially leading to inaccurate valuations or predictions. To address this limitation, methodologies incorporating temporal information, such as dynamic graph embeddings or recurrent graph neural networks, are required to model the time-varying aspects of the player network and capture the full complexity of player relationships.

Evaluating outlier performance reveals that static embeddings offer a beneficial balance between rescue and misguidance, while dynamic architectures demonstrate a generalization penalty due to their reliance on past network states.

Dynamic Embeddings and Robust Valuation: Evidence in Action

Dynamic graph embedding techniques, specifically GraphSAGE and R-GCN, are utilized to represent player relationships as evolving data points. These methods move beyond static network analysis by incorporating temporal information reflecting real-world events such as player trades and injuries. GraphSAGE employs a neighborhood sampling approach, enabling generalization to unseen nodes and efficient computation on large graphs, while R-GCN (Relational Graph Convolutional Network) explicitly models the different types of relationships between players, improving representation accuracy. By continuously updating these embeddings, the models capture shifts in player value influenced by changes in team dynamics and individual player status, thereby providing a more current and relevant basis for valuation.

Player valuation models were constructed utilizing machine learning algorithms, specifically XGBoost and Random Forest, and were trained on the dynamic graph embeddings. These embeddings served as feature inputs representing each player’s network position and evolving relationships. The resulting models predict a player’s economic worth, quantifying their contribution to team success based on both individual statistics and the contextual information captured within the graph structure. Model performance was assessed through out-of-sample prediction accuracy and comparison to existing market valuations, with a focus on identifying instances where the models accurately capture previously unquantified value.

The Tri-State Evaluation protocol moves beyond simple accuracy metrics by categorizing model corrections into three distinct states: rescue, indicating the model successfully corrected an initial underestimation; misguidance, where the model initially overstated a player’s value before correcting downwards; and neutral, representing instances where the model’s initial prediction was close enough to the final value that no significant correction was needed. This granular categorization allows for a more detailed analysis of model behavior, identifying specific areas where the model excels or requires improvement beyond overall error reduction. By differentiating between successful “saves” and problematic overestimations, the protocol provides a nuanced understanding of model reliability and potential biases.

The models demonstrate a strong Structural Proxy, reconstructing missing institutional context through analysis of network connectivity. Specifically, player valuation error was reduced by over $9 million for Fred VanVleet when utilizing the RotatE embedding method, as compared to a baseline model lacking network-based contextualization. This indicates the model effectively infers relevant, but unobserved, factors influencing player worth-such as team dynamics or coaching impact-by leveraging relationships within the player network. This capability extends beyond individual cases, suggesting the network embeddings provide valuable information not captured by traditional statistical features alone.

Mitigating Bias and Ensuring Fair Valuation: A Broader Impact

The persistent overvaluation of veteran players, termed the ‘Legacy Hangover’, presents a significant challenge to accurate player valuation in professional sports. This tendency, rooted in past performance and reputation, can skew assessments of current ability and future potential. To counteract this, a sophisticated approach to model calibration and feature engineering was implemented. This involved meticulously adjusting the weighting of historical data and prioritizing metrics that reflect present-day performance, such as advanced statistics and quantifiable on-court impact. By diminishing the influence of past accolades and focusing on current contributions, the model aims to provide a more objective and data-driven assessment of player worth, ultimately fostering fairer evaluations and more efficient resource allocation within teams.

A central challenge in evaluating athletes lies in disentangling past achievements from present capabilities; simply relying on historical status can inflate valuations, while focusing solely on current performance risks undervaluing established players. This work addresses this complexity by explicitly modeling the interaction between an athlete’s legacy and their contemporary contributions. The methodology doesn’t treat these as independent factors, but rather assesses how past reputation influences perceptions of current skill, and vice versa. This allows for a more nuanced and objective valuation, moving beyond a simple summation of accolades and statistics to reveal a player’s true worth based on a holistic understanding of their trajectory and present form. The result is a system designed to minimize the impact of cognitive biases and provide a fairer, data-driven assessment of player value.

A refined valuation model moves beyond simplistic metrics to capture the complex contributions of athletes, providing benefits across the professional sports landscape. This approach doesn’t merely assign a dollar figure; it dissects performance, accounting for both established reputation and current on-court impact, ultimately offering a more accurate reflection of a player’s true worth. Teams can leverage this detailed analysis for strategic roster construction and trade negotiations, while player agents gain a stronger foundation for advocating on behalf of their clients. Crucially, athletes themselves benefit from a system that more fairly recognizes their contributions, fostering a more equitable and sustainable environment within the league and potentially maximizing their earning potential through data-driven contract discussions.

The valuation method demonstrably fosters a more equitable landscape within professional sports, promising fairer player transactions and strategically informed contract discussions that benefit all stakeholders. However, initial model iterations, specifically Version 1, revealed instances of ‘structural inertia’, where rapidly improving players like Desmond Bane experienced a valuation error of $22.2 million. This discrepancy underscores the challenge of accurately capturing emergent talent and the need for continuous refinement of algorithms to avoid underestimating players experiencing unexpected performance leaps, ultimately contributing to a more sustainable and realistic ecosystem for assessing athlete worth.

The research meticulously presented underscores a preference for parsimony in model complexity, aligning with the spirit of elegant solutions. It reveals that while graph neural networks excel at capturing the nuanced relational capital of veteran players – the accumulated value of their connections – they offer little advantage over simpler tabular methods for evaluating rookies. This resonates with Andrey Kolmogorov’s observation: “The most important things are the simplest things.” The study demonstrates that unnecessary complexity diminishes returns; in player valuation, as in mathematics, focusing on fundamental relationships-captured effectively by tabular data for those newly entering the league-yields the most meaningful results. The pursuit of perfection, in this context, isn’t about adding layers of intricacy, but about distilling the essential information.

What’s Next?

The exercise, predictably, reveals diminishing returns. They called it ‘relational capital,’ but it often feels like simply accounting for the inertia of established reputations. This work demonstrates a capacity to quantify the value of a player’s network, certainly. But the real story resides in the fact that those networks matter less for those just arriving on the scene. Tabular data, that most humble of sources, proves sufficient for the truly unknown. Perhaps this is a lesson for all valuation exercises: the past explains surprisingly little of the future, and the effort spent retrofitting complexity onto simple truths is often a distraction.

The next iteration will likely involve attempts to bridge this gap. To force the graph networks to ‘learn’ the genesis of those relationships, rather than simply observing their mature form. This will demand cleverer feature engineering, more sophisticated architectures, and, inevitably, more computational expense. A simpler approach – focusing the graph networks on the early careers of players, and then transitioning to tabular methods as reputations solidify – might prove more fruitful. Or, conceivably, it might reveal that some problems are best addressed with Occam’s razor, not a hyperparameter search.

Ultimately, the value lies not in predicting salary with ever-increasing precision, but in understanding the limits of prediction itself. The market, after all, is not a rational actor. It is a collection of biases, heuristics, and outright irrationality. A perfect model will always be elusive, and the pursuit of perfection may, ironically, obscure the more interesting questions.

Original article: https://arxiv.org/pdf/2603.05671.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Readily Quantifiable Metrics: Understanding Relational Value

Modeling Player Networks: A Graph-Based Solution

Dynamic Embeddings and Robust Valuation: Evidence in Action

Mitigating Bias and Ensuring Fair Valuation: A Broader Impact

What’s Next?

See also: