Beyond Message Passing: Can Simple Features Match Graph Neural Networks?

Author: Denis Avetisyan

A new approach transforms graph data into tabular features, achieving surprisingly competitive results on node classification tasks.

The comparative performance evaluation demonstrates that a feedforward network (FAF+MLP) achieves comparable accuracy to a graph convolutional network (GCN) across train, validation, and test datasets, suggesting an alternative architectural approach without sacrificing predictive capability.

Fixed Aggregation Features offer a compelling alternative to complex graph neural networks by leveraging simplified data representations.

Despite the prevailing belief in the power of trainable message passing, graph neural networks (GNNs) may not always be essential for effective node representation learning-a notion challenged by the work ‘Fixed Aggregation Features Can Rival GNNs’. This paper introduces Fixed Aggregation Features (FAFs), a training-free approach that transforms graph data into tabular formats, enabling the application of well-established tabular methods and achieving competitive performance on node classification tasks-rivaling or outperforming state-of-the-art GNNs on 12 of 14 benchmarks. By demonstrating that simple mean aggregation can often suffice, these findings raise the question of whether richer benchmarks and a renewed focus on tabular models are needed to fully unlock the potential of graph data.

The Scaling Challenge in Graph Neural Networks

Graph Neural Networks (GNNs) have emerged as a potent tool for analyzing data represented as graphs, excelling in tasks like node classification and link prediction where relationships between entities are crucial. However, their effectiveness diminishes when confronted with increasingly complex and large-scale datasets. This scaling challenge arises from the computational demands of processing interconnected nodes and edges, as the number of parameters and operations grows rapidly with dataset size. Consequently, training and inference become prohibitively expensive, hindering the deployment of GNNs on real-world graphs found in social networks, knowledge graphs, and molecular biology. Addressing this bottleneck requires innovative approaches to model design and optimization, enabling GNNs to efficiently handle the complexities of modern data while preserving their predictive power.

Graph Neural Networks, despite their promise, frequently encounter performance bottlenecks stemming from two primary issues: over-smoothing and over-squashing. Over-smoothing occurs as information propagates through multiple layers; node features become increasingly averaged, diminishing their ability to uniquely identify each node and hindering effective classification. Simultaneously, over-squashing restricts the flow of information, particularly from nodes located several hops away – critical connections for understanding global graph structure are effectively lost as messages are compressed during aggregation. This creates a dilemma; increasing network depth to capture distant relationships exacerbates over-smoothing, while shallow networks lack the capacity to model complex, long-range dependencies. Consequently, a key challenge in GNN research centers on mitigating these effects to enable the robust and scalable analysis of increasingly complex graph-structured data.

Conventional Graph Neural Networks (GNNs), while adept at processing relational data, frequently encounter performance limitations due to a fundamental trade-off between their ability to learn complex patterns – expressive power – and the computational resources required to do so. As GNNs deepen, the need for increasingly complex computations to capture nuanced relationships escalates rapidly, often exceeding practical limits. This creates a bottleneck where adding more layers – a common approach to enhancing expressive power – yields diminishing returns and ultimately leads to performance plateaus. The issue stems from the inherent difficulty in efficiently propagating information across the graph while simultaneously avoiding the problems of over-smoothing and over-squashing, restricting the network’s capacity to discern meaningful distinctions between nodes and effectively leverage information from distant parts of the graph structure.

Addressing the constraints of current Graph Neural Networks requires a significant shift in how information is processed and relayed across graph structures. Researchers are actively investigating novel aggregation functions that move beyond simple mean or sum operations, seeking methods to preserve distinct node features and prevent the signal loss characteristic of over-smoothing. Simultaneously, architectural innovations – including techniques like skip connections, hierarchical pooling, and attention mechanisms – aim to alleviate the over-squashing problem by enabling more efficient long-range information propagation. These efforts aren’t merely about incremental improvements; they represent a fundamental re-evaluation of how GNNs can be designed to effectively capture complex relationships within increasingly large and intricate datasets, ultimately striving for models that are both expressive and scalable.

Streamlined Aggregation: Fixed Aggregation Features

Fixed Aggregation Features convert graph-structured data into a tabular representation suitable for traditional machine learning models. This transformation is achieved by applying predefined aggregation functions – such as mean, sum, or maximum – to the features of each node’s neighborhood. The output of these functions becomes the new feature vector for each node, effectively summarizing the local graph structure. This tabular data can then be directly input into a classifier, commonly a Multi-Layer Perceptron (MLP), without requiring specialized graph neural network layers to handle the graph structure directly. The process allows for leveraging existing, well-established machine learning techniques on graph data by pre-processing it into a format they can readily process.

Traditional Graph Neural Networks (GNNs) require learning optimal weights for aggregating information from neighboring nodes during the message-passing phase, which introduces additional parameters and computational cost. Fixed Aggregation Features circumvent this requirement by employing pre-defined, static aggregation functions – such as mean, sum, or max – eliminating the need for learnable aggregation weights. This simplification reduces the overall parameter count, potentially decreasing training time and memory usage. Furthermore, decoupling aggregation from the learning process can improve model stability and generalization performance, particularly in scenarios with limited training data or noisy graph structures, as the model is less susceptible to overfitting the aggregation process itself.

Fixed aggregation features utilize a variety of statistical functions to condense neighborhood information into a fixed-size vector representation. The mean provides an average value of neighboring features, while the sum calculates the total contribution. Maximum and minimum values identify the most prominent and least prominent features within the neighborhood, respectively. Standard deviation quantifies the dispersion or variability of feature values, providing insight into the heterogeneity of the neighborhood. The selection of an appropriate aggregation function depends on the specific graph structure and the characteristics of the node features, influencing how effectively neighborhood information is summarized for downstream classification tasks.

The fixed aggregation feature approach enables a modular investigation of aggregation strategies by separating the initial feature extraction process from the subsequent neighborhood aggregation. This decoupling allows researchers and practitioners to systematically evaluate the impact of different aggregation functions – such as mean, sum, max, min, and standard deviation – on model performance without altering the feature extraction component. By isolating the aggregation step, experimentation can focus specifically on how various methods of summarizing neighborhood information affect classification accuracy and computational efficiency, facilitating a more targeted optimization process and a deeper understanding of feature representation within graph neural networks.

The FAF+MLP model demonstrates comparable accuracy to the GCN model across training, validation, and test datasets.

Preserving Information: Lossless Aggregation and the Kolmogorov-Arnold Theorem

Effective fixed aggregation relies on the construction of functions designed to be ‘lossless’ with respect to a node’s neighborhood. This means the aggregation function must fully capture and represent all relevant information contained within the input features of neighboring nodes without discarding any data. Specifically, a lossless function ensures that a complete reconstruction of the neighborhood’s state is theoretically possible from the aggregated output. The preservation of this complete information is crucial for maintaining the expressiveness of the aggregation process and avoiding a loss of discriminatory power in subsequent analyses or model training. Failure to achieve lossless aggregation can introduce irreversible information loss, limiting the potential performance of fixed aggregation features.

The Kolmogorov-Arnold Representation, formally stated as any continuous function of $n$ real variables can be expressed as a finite sum of compositions of simpler functions with a single variable, provides a rigorous framework for constructing lossless aggregation functions. This representation guarantees that no information is lost during aggregation because it demonstrates the capacity to perfectly reconstruct the original input data from the aggregated output. Specifically, by decomposing complex aggregation operations into a series of one-dimensional functions, the representation enables the systematic design of functions that preserve all relevant information from a node’s neighborhood, ensuring complete data retention throughout the fixed aggregation process. The theorem’s utility lies in its ability to mathematically prove the existence of lossless aggregations and to guide the creation of practical, computationally feasible implementations.

The Kolmogorov-Arnold Representation allows researchers to systematically construct aggregation functions by providing a framework to balance expressiveness and computational cost. This theorem defines conditions under which a function can be represented as a finite sum of indicator functions of half-spaces, directly linking the complexity of the function to the number of terms required for its representation. By controlling the number of terms – and therefore the computational burden – researchers can design aggregation functions capable of representing complex relationships within the input data while remaining computationally tractable. This approach enables the creation of aggregation functions optimized for specific datasets and computational constraints, maximizing the information retained during the aggregation process without exceeding practical limitations.

The performance of fixed aggregation features is directly linked to the expressiveness of the aggregation function, which is, in turn, fundamentally determined by whether the aggregation is lossless. Lossless aggregation, guaranteed by the Kolmogorov-Arnold Representation, ensures that all relevant information from a node’s receptive field is preserved during the aggregation process. Increased expressiveness, achieved through lossless aggregation, allows the feature to represent a wider range of input patterns without information loss, leading to improved discrimination and ultimately, better overall performance in tasks utilizing these features. Conversely, lossy aggregation limits expressiveness and can hinder the feature’s ability to accurately represent the input data, negatively impacting performance metrics.

Expanding the GNN Landscape: The Impact of Fixed Aggregation

Fixed Aggregation Features represent a compelling shift in graph neural network design, offering a potentially more efficient alternative to established architectures like Graph Convolutional Networks (GCN), Graph Attention Networks (GATv2), and GraphSAGE. Unlike these methods which rely on learned aggregation weights and complex propagation schemes, Fixed Aggregation Features utilize a streamlined, parameter-free aggregation process. This simplicity translates directly into reduced computational overhead, making them particularly attractive when dealing with large graphs or constrained computing environments. The approach retains representational power while significantly decreasing the number of trainable parameters, allowing for faster training and inference times – a crucial benefit in resource-limited scenarios and enabling broader deployment possibilities for graph-based machine learning.

The design of Fixed Aggregation Features prioritizes architectural flexibility, enabling seamless integration into a wide variety of machine learning pipelines. Unlike many graph neural networks with tightly coupled components, FAFs generate node embeddings independently of the final prediction layer, functioning as a versatile building block. This modularity allows researchers and practitioners to readily combine FAFs with diverse downstream tasks – from standard node classification and link prediction to more complex applications like graph clustering and anomaly detection. Furthermore, the method’s simplicity facilitates its incorporation into existing architectures; FAF-derived node embeddings can be easily fed into any tabular learning model, including multi-layer perceptrons, gradient boosting machines, or even more sophisticated neural networks, offering a plug-and-play solution for leveraging graph structure within broader machine learning systems.

The streamlined aggregation within Fixed Aggregation Features not only boosts computational efficiency but also facilitates a deeper understanding of how graph information is processed. Unlike the complex parameter sharing and attention mechanisms of conventional Graph Neural Networks, FAFs offer a more transparent pathway for tracing feature propagation. This simplicity allows researchers to dissect the learned node representations with greater ease, identifying which neighboring features most strongly influence a given node’s embedding. Consequently, techniques like feature importance analysis and visualization become more effective, potentially revealing critical patterns within the graph structure and the underlying data. This enhanced interpretability is a significant advantage, particularly in applications where understanding the reasoning behind a model’s predictions is as crucial as the predictions themselves, such as in drug discovery or social network analysis.

Recent evaluations demonstrate that Fixed Aggregation Features (FAFs) offer a compelling alternative to established Graph Neural Networks (GNNs) in node classification tasks. Across a comprehensive suite of fourteen benchmark datasets, FAFs achieved performance levels that either matched or surpassed those of traditional GNN architectures – including Graph Convolutional Networks, Graph Attention Networks, and GraphSAGE – on a significant majority of the tests, specifically twelve out of fourteen. This consistent ability to compete with, and often exceed, the performance of more complex GNN models underscores the effectiveness of the simplified aggregation approach employed by FAFs, suggesting that sophisticated message-passing schemes are not always necessary to achieve strong results in graph-based learning.

A key advantage of Fixed Aggregation Features lies in their computational efficiency, demonstrated by their ability to achieve comparable performance to state-of-the-art Graph Neural Networks with significantly fewer message passing hops. While many powerful GNN architectures require 10 to 15 layers to effectively propagate information across a graph, FAFs consistently deliver competitive results with just 2 to 4 hops. This reduction in computational depth translates directly to faster training and inference times, and reduced memory requirements – crucial benefits when dealing with large-scale graph datasets or resource-constrained environments. The streamlined propagation process allows for quicker learning without sacrificing representational power, offering a compelling alternative for applications prioritizing speed and scalability.

Performance evaluations reveal that Fixed Aggregation Features (FAFs) demonstrate near-comparable results to more complex Graph Neural Networks, particularly when assessed on the Roman-Empire and Minesweeper datasets. This suggests that these specific tasks require the capacity to leverage information from nodes further afield, or benefit from specialized connection types not inherently present in the simplified FAF architecture. While FAFs excel in scenarios where local neighborhood information is sufficient, the Roman-Empire and Minesweeper datasets imply a greater need for either extended receptive fields – achievable with deeper architectures or increased hop counts – or the inclusion of residual connections that facilitate the propagation of signals across longer distances within the graph structure.

Investigations into the classification capabilities of Fixed Aggregation Features revealed a significant dependence on classifier complexity. Ablation studies consistently demonstrated that employing a multi-layer perceptron (MLP) as the classifier yielded substantially improved performance compared to a linear classifier when applied to features generated through fixed aggregation. This finding underscores the importance of non-linear capacity in effectively extracting and utilizing information embedded within the tabular representations produced by the model; linear classifiers, while computationally efficient, appear limited in their ability to capture the complex relationships present in the aggregated node features, highlighting the need for more expressive models to fully leverage the potential of this approach.

The FAF+MLP model demonstrates comparable accuracy to the GCN model across training, validation, and testing datasets.

The pursuit of effective graph representation, as demonstrated by Fixed Aggregation Features, echoes a fundamental principle of system design: elegance stems from simplicity. This work champions transforming complex graph structures into tabular data, bypassing the intricacies of learned message passing-a deliberate streamlining of process. It recalls Marvin Minsky’s observation: “The more general a rule is, the less information it conveys.” FAFs, by focusing on a fixed aggregation, prioritize conveying essential information for node classification, demonstrating that often, a less complex model can achieve comparable, and sometimes superior, results. The efficiency gained by avoiding learned parameters highlights how structure dictates behavior, and a well-considered structure-even a simplified one-can unlock significant performance gains.

Where Do We Go From Here?

The demonstrated equivalence between fixed aggregation and the expressive power of graph neural networks invites a re-evaluation of what constitutes ‘learning’ in these systems. The pursuit of increasingly complex message-passing schemes may, in some instances, be an exercise in over-engineering. This work suggests that the structure of the graph itself often contains sufficient information for effective node classification, diminishing the need for elaborate, learned transformations. The question becomes not simply ‘how do nodes communicate?’, but ‘what is the minimal, sufficient representation of the graph’s inherent organization?’.

A natural progression involves exploring the limitations of fixed aggregation. The current approach relies on a defined feature space; extending this to automatically discover relevant fixed features, potentially through techniques inspired by Kolmogorov-Arnold representation, could unlock greater adaptability. Furthermore, understanding why certain fixed features prove effective – what structural properties they capture – will be crucial. This isn’t merely about achieving performance; it’s about gaining insight into the underlying principles governing information flow within graphs.

Ultimately, this line of inquiry points towards a broader shift. The field has become enamored with the ‘neural’ aspect of graph networks; perhaps a more fruitful path lies in a renewed focus on graph simplification and feature engineering. Elegance, after all, rarely resides in complexity, but in finding the essential core.

Original article: https://arxiv.org/pdf/2601.19449.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Scaling Challenge in Graph Neural Networks

Streamlined Aggregation: Fixed Aggregation Features

Preserving Information: Lossless Aggregation and the Kolmogorov-Arnold Theorem

Expanding the GNN Landscape: The Impact of Fixed Aggregation

Where Do We Go From Here?

See also: