Faster Aerodynamic Design with Graph Networks and Smart Data

Author: Denis Avetisyan

Researchers have created a new dataset and scaling laws to accelerate aerodynamic simulations using graph neural networks, enabling efficient design even with limited data.

The study demonstrates that test Mean Squared Error (testMSE) scales predictably with training set size (<span class="katex-eq" data-katex-display="false">D_D</span>), where <span class="katex-eq" data-katex-display="false">D_D</span> represents the number of unique geometry-flow snapshots used as graph-based data, and this scaling behavior differs significantly across models of varying sizes. — The study demonstrates that test Mean Squared Error (testMSE) scales predictably with training set size ( $D_D$ ), where $D_D$ represents the number of unique geometry-flow snapshots used as graph-based data, and this scaling behavior differs significantly across models of varying sizes.

This work presents a multi-fidelity dataset for double-delta wing aerodynamics and investigates data scaling relationships for graph neural network-based surrogate models.

Accelerating vehicle design necessitates efficient aerodynamic prediction, yet open, multi-fidelity datasets coupled with clear data scaling guidelines remain scarce. This work, ‘A Multi-fidelity Double-Delta Wing Dataset and Empirical Scaling Laws for GNN-based Aerodynamic Field Surrogate’, addresses this gap by releasing a new dataset of double-delta wing flows and investigating the relationship between training data size and the performance of graph neural network-based surrogate models. Results demonstrate efficient data utilization with a power-law scaling exponent of -0.6122, suggesting an optimal sampling density of approximately eight samples per dimension. Could these findings unlock more cost-effective aerodynamic optimization strategies by balancing dataset generation with model complexity?

The Computational Bottleneck in Aerospace Design

The SCALOS program, dedicated to advancing aerospace designs, has historically depended on computational fluid dynamics (CFD) simulations for precise aerodynamic prediction. These simulations, while capable of detailing airflow and forces acting on a craft, demand substantial computational resources and time. Accurately modeling complex phenomena – such as turbulent flow and shockwave interactions – requires extremely fine meshes and iterative solving procedures. Consequently, each design iteration, and the thorough analysis it necessitates, can take days or even weeks to complete on high-performance computing clusters. This reliance on computationally expensive methods creates a significant bottleneck, limiting the number of designs that can be explored and hindering the overall pace of innovation within the program.

Computational Fluid Dynamics (CFD) simulations, employing Reynolds-Averaged Navier-Stokes (RANS) equations and turbulence models such as the Spalart-Allmaras (SA-R) model, represent a significant computational burden in aerodynamic design. While these methods offer a pragmatic approach to modeling turbulent flows, their inherent complexity demands substantial processing time, even with modern high-performance computing. This lengthy computation cycle restricts the number of design iterations that can be realistically explored within a given timeframe. Consequently, engineers face a limitation in thoroughly investigating the vast design space, potentially hindering the identification of truly optimal aerodynamic configurations and slowing the pace of innovation within programs like SCALOS. The need for methods that retain accuracy while drastically reducing computational cost is therefore critical to unlock a more expansive and efficient design process.

The pace of aerospace innovation is fundamentally constrained by the limitations of current predictive capabilities. While detailed computational fluid dynamics offers valuable insight, its intensive demands on processing power and time restrict the number of designs that can be thoroughly evaluated. This bottleneck hinders the exploration of potentially groundbreaking aerodynamic configurations, slowing the development of more efficient and higher-performing aircraft. Consequently, a pressing need exists for predictive methods that maintain a high degree of accuracy – essential for safety and performance – while dramatically reducing computational cost. Such advancements would unlock the capacity to rapidly iterate through designs, accelerating discovery and ultimately fostering a new era of aerospace engineering.

MF-VortexNet: A Graph-Based Predictive Surrogate

MF-VortexNet utilizes a graph neural network (GNN) architecture designed to predict high-fidelity flow field data from lower-fidelity simulations. This is achieved by representing the flow domain as a graph, where nodes correspond to discrete points and edges represent relationships between them. The GNN learns a mapping function that transforms the low-fidelity data associated with each node into a high-fidelity prediction. Crucially, the network is “physics-informed,” meaning its architecture and training process incorporate known physical constraints, such as conservation of mass and momentum, to ensure the predicted fields are physically plausible and generalize effectively to unseen conditions. This approach allows for substantial reductions in computational cost compared to traditional high-fidelity computational fluid dynamics (CFD) simulations while maintaining a high degree of accuracy.

The MF-VortexNet architecture utilizes graph neural networks (GNNs) due to their computational efficiency in processing irregular data structures common in fluid dynamics simulations. Traditional methods often require high-resolution meshes, leading to substantial computational costs. GNNs, however, operate directly on graph-structured data, reducing the need for extensive mesh refinement. Furthermore, the model incorporates physical constraints – specifically, the principles governing fluid flow – directly into the network’s learning process. This physics-informed approach improves both the accuracy of predictions and the model’s ability to generalize to unseen flow conditions, mitigating the risk of unphysical or unstable outputs that can arise from purely data-driven methods.

MF-VortexNet achieves significant computational cost reduction by utilizing existing Computational Fluid Dynamics (CFD) data as a training dataset. This allows the model to learn the complex relationships between low-fidelity inputs and high-fidelity outputs, effectively acting as a surrogate model. By training on pre-computed CFD simulations, MF-VortexNet bypasses the need for repeated, expensive high-fidelity simulations during inference. Evaluations demonstrate that the model maintains predictive power comparable to full CFD simulations while reducing computational time by several orders of magnitude, enabling rapid exploration of design spaces and real-time predictions.

Dataset Generation and Validation Protocol

A dataset of up to 1280 computational fluid dynamics (CFD) and Vortex Lattice Method (VLM) simulations was generated to represent a range of Double-Delta Wing configurations. This dataset consists of graphical representations of aerodynamic properties derived from simulations performed across a variety of geometric parameters. The combination of VLM and CFD methods allows for a balance between computational cost and solution accuracy, facilitating the creation of a large and diverse dataset suitable for machine learning applications. Each graph within the dataset represents a unique wing configuration, defined by specific geometric inputs, and the corresponding predicted aerodynamic characteristics.

To efficiently explore the design space of Double-Delta Wing configurations, a Design of Experiments (DOE) methodology utilizing Saltelli Sampling was implemented. This variance-based sensitivity analysis technique systematically generates combinations of geometric parameters, ensuring a diverse and representative dataset with minimal redundant simulations. Saltelli Sampling, a form of quasi-Monte Carlo method, achieves improved space-filling properties compared to random sampling, thereby maximizing the information gained from each simulation run. The method defines a set of input parameters and systematically varies them across their defined ranges, creating a distribution of configurations optimized for identifying key geometric influences on aerodynamic performance.

The generated dataset of aerodynamic characteristics, comprising up to 1280 simulations for varied Double-Delta Wing geometries, is integral to the development and assessment of MF-VortexNet. This dataset is partitioned into training and validation subsets; the training set is used to optimize the MF-VortexNet’s internal parameters, enabling it to learn the complex relationship between wing geometry and aerodynamic performance. Subsequently, the independent validation set is used to evaluate the trained model’s ability to generalize to unseen configurations, providing a quantitative measure of its predictive accuracy and overall reliability. This rigorous training and validation process is crucial for ensuring MF-VortexNet’s performance across the design space and its suitability for aerodynamic prediction tasks.

Mean Squared Error (MSE) was selected as the primary metric for evaluating the predictive accuracy of the model, calculated as the average of the squared differences between the predicted and actual aerodynamic coefficients. $MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$ , where $y_i$ represents the actual value, $\hat{y}_i$ represents the predicted value, and n is the total number of data points. Lower MSE values indicate a better model fit and higher predictive accuracy across the dataset, providing a quantifiable assessment of the model’s performance for various Double-Delta Wing configurations. This metric allows for direct comparison of different model iterations and validation against the CFD and VLM simulation data.

The dataset exhibits a diverse design space coverage with varying distributions of design variables, enabling exploration of a wide range of potential solutions.

Data Scaling Laws and Optimization Strategies

Investigations into MF-VortexNet’s performance have revealed a predictable relationship between dataset size and predictive accuracy, establishing a clear Data Scaling Law. This finding indicates that, as the quantity of training data increases, the model’s ability to generalize and make accurate predictions also improves – a principle fundamental to machine learning. The observed scaling isn’t linear; rather, it suggests that gains in accuracy diminish as the dataset grows, but continue to be significant even with substantial increases in data volume. This understanding is crucial for guiding future development, allowing researchers to strategically allocate computational resources and prioritize data collection efforts to maximize the model’s performance potential. The predictable nature of this scaling law allows for informed estimations of accuracy gains before substantial investments in data acquisition, optimizing the return on computational resources.

The performance gains observed with increasing dataset size in MF-VortexNet adhere to a predictable pattern, accurately described by a Power Law. Analysis reveals a scaling exponent – denoted as $β$ – of approximately 0.6122 for the Medium model within a specific training configuration. This mathematical relationship signifies that accuracy improves proportionally to the dataset size raised to the power of 0.6122. The robustness of this model is confirmed by a remarkably high R-squared value of 0.9985, indicating an extremely strong correlation between the predicted Power Law and the observed performance data, and establishing a quantifiable link between data quantity and model accuracy.

While increasing model size demonstrably improves performance in many machine learning contexts, analysis of MF-VortexNet reveals a compelling prioritization of data quantity. Though larger models initially exhibit gains, the observed data scaling law – a power law relationship between dataset size and accuracy – indicates that expanding the training dataset yields a more substantial and consistent improvement. This suggests that, beyond a certain point, investing in acquiring more data delivers a greater return on computational resources than simply increasing model complexity. The scaling exponent $β$ of approximately 0.6122 quantifies this effect, highlighting the disproportionate impact of data on overall accuracy and offering a pathway for optimizing future development efforts by strategically prioritizing data collection.

Analysis of the data scaling law revealed a crucial insight for optimizing future data collection: an estimated optimal sampling distance of 0.34. This value, derived from the scaling exponent of approximately 0.6122, indicates the ideal granularity at which to gather new data points for the MF-VortexNet model. By focusing data acquisition efforts around this specific distance, researchers can achieve the most significant gains in model accuracy for each unit of computational resource invested. This targeted approach represents a shift from simply increasing dataset size to strategically curating data, promising a more efficient path toward enhanced performance and a maximized return on investment in future studies.

Scaling tests of mean squared error (MSE) with data dimension (DD) reveal power-law behavior across models of varying size-Mini, Small, Medium, and Large-indicating consistent performance characteristics as model complexity increases.

The pursuit of aerodynamic efficiency, as detailed in this study of graph neural networks for surrogate modeling, echoes a fundamental principle of mathematical rigor. The work demonstrates how sparse sampling, combined with multi-fidelity data, can yield surprisingly accurate results – a testament to the power of disciplined methodology. This aligns with Bertrand Russell’s observation that “The whole of mathematics is symbolic logic.” The authors, through careful data scaling and model validation, effectively construct a ‘logical’ representation of aerodynamic forces, mirroring the elegance Russell championed. The reliance on provable relationships, rather than purely empirical observation, is key to achieving robust and generalizable performance in complex aerodynamic design.

Beyond Approximation

The presented work, while demonstrating the efficacy of graph neural networks for aerodynamic field reconstruction, merely skirts the fundamental question of representation. The observed scaling behavior, while promising, is ultimately empirical. A rigorous derivation – a mathematical guarantee of convergence with decreasing sample size – remains elusive. To truly advance this field, the focus must shift from simply achieving accuracy on benchmarks to establishing provable bounds on the error introduced by the surrogate model itself. Every parameter saved, every computational cycle gained, is rendered meaningless if the underlying approximation lacks a demonstrable foundation.

Furthermore, the inherent limitations of multi-fidelity modeling are consistently underestimated. The assumption of smoothly decaying error across fidelity levels is a convenience, not a certainty. Future work should explore adaptive fidelity selection strategies, guided not by heuristic rules, but by a formal analysis of information content and error propagation. The elegance of a minimal implementation is not merely aesthetic; it directly correlates with the reduction of potential abstraction leaks and the increased likelihood of a correct, verifiable solution.

Finally, the exclusive focus on aerodynamic fields obscures a broader truth. This methodology, at its core, is a general framework for function approximation. The true test will lie in its application to problems where the cost of a single high-fidelity evaluation is astronomical, and where the consequences of approximation are correspondingly severe. Only then will the true mettle of this approach be revealed.

Original article: https://arxiv.org/pdf/2512.20941.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Computational Bottleneck in Aerospace Design

MF-VortexNet: A Graph-Based Predictive Surrogate

Dataset Generation and Validation Protocol

Data Scaling Laws and Optimization Strategies

Beyond Approximation

See also: