Bridging the Gap in Time Series: A New Approach to Imputation

Author: Denis Avetisyan

Researchers are combining deterministic estimation with generative modeling to create a more accurate and reliable method for filling in missing data in time series.

Time series imputation methods diverge in their reliance on prior knowledge; deterministic approaches bypass inherent data probabilities, while diffusion models employ stochastic sampling, yet the proposed Bridge-TS method uniquely integrates expert priors with bridge models to efficiently converge on the target distribution, offering a more precise imputation than methods neglecting these foundational data characteristics.

This paper introduces Bridge-TS, a framework leveraging Schrödinger Bridges and informative priors within score-based generative models for improved time series imputation.

Despite advances in generative time series imputation, existing methods often struggle with accuracy due to uninformative priors that burden the generation process. This work, ‘Exploiting the Prior of Generative Time Series Imputation’, introduces Bridge-TS, a novel framework leveraging data-to-data generation via a Schrödinger Bridge to enhance imputation performance. By incorporating both expert-driven deterministic estimation and compositional priors, Bridge-TS establishes a new state-of-the-art in imputation accuracy across benchmark datasets. Could this approach to informative priors unlock further improvements in generative modeling for complex time series data?

Deconstructing Time: The Illusion of Complete Data

The prevalence of time series data – sequences of information collected over regular intervals – is undeniable, extending from financial markets and weather patterns to physiological signals and industrial sensor readings. However, the very nature of real-world data collection introduces inevitable gaps; missing values arise from sensor malfunctions, communication failures, or simply periods where data wasn’t recorded. These omissions pose a significant challenge to data analysis and forecasting because most algorithms require complete datasets to function optimally. Ignoring missing data can lead to biased estimates and inaccurate predictions, while naive imputation methods often distort the underlying temporal dependencies. Consequently, addressing missing values in time series isn’t merely a data cleaning step, but a critical component of ensuring reliable and meaningful insights from these increasingly important datasets.

While seemingly straightforward, the application of linear interpolation to time series data often introduces significant distortions. This technique assumes a constant rate of change between known data points, a premise frequently violated by the dynamic and often non-linear nature of real-world temporal patterns. Consequently, imputed values can systematically underestimate or overestimate true values, particularly when dealing with seasonal trends, cyclical fluctuations, or abrupt shifts in the underlying process. This bias propagates through subsequent analyses, impacting the reliability of forecasts, trend identification, and statistical inferences. The method fails to account for autocorrelation-the dependence of a value on its past-and thus overlooks crucial information embedded within the series, leading to a simplified and potentially misleading representation of the data’s true behavior.

As time series data proliferates across fields like finance, healthcare, and environmental monitoring, the sheer volume and intricate dependencies within these datasets necessitate imputation techniques beyond simple methods. Traditional approaches, while computationally efficient, often struggle to accurately reconstruct missing values in the face of non-linear trends, seasonality, or complex correlations. Consequently, researchers are developing advanced imputation strategies – including those leveraging machine learning models and deep learning architectures – designed to capture these underlying patterns and provide more reliable data for analysis and forecasting. These methods aim not merely to fill gaps, but to realistically recreate the expected values, preserving the integrity of the time series and minimizing the risk of biased conclusions. The ability to effectively handle missing data is becoming increasingly critical for extracting meaningful insights from the ever-growing flood of temporal information.

Despite utilizing Non-stationary Transformer and TimesNet as compositional priors, Bridge-TS-2 demonstrably outperforms them in time series imputation, achieving lower pointwise errors even when those priors exhibit larger errors themselves.

Beyond Patchwork: Generative Models and the Reconstruction of Reality

Traditional imputation methods often rely on statistical measures like mean or median, or simple interpolation techniques, which can distort the underlying characteristics of time series data. Generative models, conversely, learn the complex, probabilistic distribution governing the observed time series. This allows the model to not simply fill in missing values, but to generate plausible values consistent with the learned distribution. By capturing dependencies and patterns within the data, generative approaches produce imputed time series that more accurately reflect the original data’s characteristics, leading to improved downstream analysis and modeling performance. The efficacy of this approach is particularly pronounced in complex, non-linear time series where simple methods falter.

Denoising Diffusion Probabilistic Models (DDPMs) establish a robust framework for time series imputation by iteratively refining imputed values through a learned denoising process. However, the computational demands of DDPMs are significant; each imputation requires multiple forward passes through the diffusion process and numerous denoising steps. This results in substantially higher processing times compared to traditional imputation methods. Furthermore, performance is highly sensitive to hyperparameter selection, including the noise schedule, model architecture, and sampling strategy; careful tuning and validation are therefore crucial to achieve optimal imputation accuracy and prevent the introduction of unrealistic or biased values.

The integration of deterministic models as expert priors within generative imputation frameworks represents a significant advancement in time series analysis. These deterministic models, often based on established domain knowledge or physical constraints, provide initial estimations or ‘priors’ that constrain the solution space for the generative model – such as a Diffusion Probabilistic Model (DPM). This guidance reduces the ambiguity inherent in purely data-driven generative approaches, leading to more accurate and plausible imputations, particularly in scenarios with limited or noisy data. By incorporating these priors, the generative process is effectively steered towards solutions consistent with known system behavior, improving both the quality and reliability of the imputed time series.

Bridge-TS: Sculpting Probability with Prior Knowledge

Bridge-TS addresses time series imputation by leveraging pre-trained, deterministic models – specifically TimesNet, Non-stationary Transformer, and FEDformer – to generate initial “prior” estimates for missing data. These models, each possessing unique strengths in capturing temporal dependencies, provide distinct perspectives on the underlying time series. Rather than selecting a single model, Bridge-TS integrates the outputs of these models as a compositional prior, effectively combining expert opinions. This approach aims to improve imputation accuracy and robustness by mitigating the limitations inherent in any single deterministic model and capitalizing on their complementary strengths. The resulting prior then serves as the foundation for a probabilistic refinement process using a Schrödinger Bridge.

The Schrödinger Bridge is employed as a continuous-time probabilistic mechanism to refine initial imputation priors. This method defines a diffusion process that transitions the prior distribution – representing initial beliefs about the missing data – towards the observed data distribution. Specifically, it formulates a stochastic differential equation (SDE) where the solution represents the imputed time series. The SDE is designed such that its drift and diffusion terms guide the prior towards the empirical data distribution, effectively smoothing the transition and minimizing the discrepancy between the imputed values and the observed data. This process allows for a principled integration of prior knowledge with data-driven learning, resulting in more accurate and robust imputation compared to methods relying solely on observed values.

Utilizing compositional priors in time series imputation involves synthesizing predictions from multiple pre-trained models, each representing a distinct ‘expert’ perspective on the data. This approach mitigates the risk of relying on a single model which may exhibit biases or limitations in specific data regimes. By combining these expert opinions – such as those derived from TimesNet, Non-stationary Transformer, and FEDformer – the imputation process becomes more robust to noisy or incomplete data. The final imputation is not simply an average of the individual predictions, but rather a weighted combination determined by the Schrödinger Bridge, allowing the model to prioritize more reliable expert opinions and reduce the impact of less accurate ones, ultimately improving imputation accuracy and generalization performance.

Bridge-TS leverages concatenated priors from multiple experts, optimizes them against concatenated ground truth within a Schrödinger bridge, and averages the resulting channels to produce a final, refined output.

Unveiling Performance: Validation Across Temporal Landscapes

Rigorous experimentation across established time series benchmarks – encompassing energy (ETT), financial (Exchange), and meteorological (Weather) datasets – reveals a consistent performance advantage for Bridge-TS. The model was subjected to diverse masking ratios and data complexities, consistently exceeding the accuracy of traditional imputation techniques and competing generative models. These findings suggest that Bridge-TS’s architecture effectively captures the underlying dynamics of time series data, enabling more reliable reconstruction of missing values and offering a robust solution for applications where data completeness is critical. This consistent outperformance across varied datasets solidifies Bridge-TS as a promising advancement in time series data handling.

Rigorous quantitative analysis confirms Bridge-TS’s proficiency in time series reconstruction. Evaluations utilizing standard metrics – specifically, Mean Squared Error $MSE$ and Mean Absolute Error $MAE$ – consistently demonstrate that Bridge-TS outperforms existing state-of-the-art imputation techniques. These metrics provide a precise measure of the difference between the reconstructed and actual values, and the achieved lower $MSE$ and $MAE$ scores validate the high accuracy and reliability of Bridge-TS in handling missing data. This superior performance suggests a robust capability to restore data integrity, leading to more dependable time series analysis and forecasting outcomes.

Rigorous evaluation demonstrates that Bridge-TS consistently minimizes both Mean Squared Error (MSE) and Mean Absolute Error (MAE) across a diverse suite of benchmark datasets – encompassing energy, financial, and weather data – and under varying degrees of data occlusion. This sustained performance, achieving lowest or comparable error rates even with substantial data masked, underscores the model’s inherent robustness and capacity for accurate reconstruction. The consistently low error metrics suggest that Bridge-TS offers a substantial improvement over existing time series imputation techniques, promising enhanced accuracy and reliability in downstream analytical tasks and forecasting applications across numerous scientific and industrial domains. This consistent performance indicates a significant potential for Bridge-TS to advance the field of time series analysis by mitigating the impact of missing data and improving predictive capabilities.

Beyond the Horizon: Expanding the Framework’s Reach

Ongoing research aims to bolster the accuracy and computational speed of the Schrödinger Bridge by investigating advanced probabilistic techniques for refining the initial expert prior. Current methods rely on assumptions about the prior distribution, and exploring alternatives – such as variational inference or more sophisticated Markov Chain Monte Carlo algorithms – promises to create a more robust and reliable bridge between initial and target distributions. This refinement isn’t merely about statistical precision; it directly impacts the efficiency of the Bridge-TS algorithm, potentially reducing the computational burden associated with complex time series data. By carefully calibrating the expert prior, researchers hope to minimize the ‘drift’ during the Schrödinger Bridge process, leading to faster convergence and more accurate time series transformations, ultimately broadening the applicability of this powerful technique.

The performance of Bridge-TS could be significantly improved by implementing adaptive priors, which represent a departure from static, pre-defined assumptions about the generated time series. Instead of relying on a fixed prior distribution, these adaptive priors would dynamically adjust their parameters based on the inherent characteristics of the input data – such as its volatility, trend, or seasonality. This allows the model to better capture the nuances of each individual time series, leading to more accurate and realistic extrapolations. By intelligently tailoring the prior to the specific data, the Schrödinger Bridge can more efficiently navigate the space of possible future trajectories, resulting in enhanced predictive capabilities and a more robust framework for time series forecasting.

The current framework demonstrates promise with univariate time series, but its true potential lies in extending its capabilities to analyze multivariate data – systems where multiple, interdependent variables evolve over time. Real-world phenomena, from financial markets to climate patterns and biological systems, are rarely governed by a single variable; they are complex webs of interaction. Adapting Bridge-TS to handle these complexities requires not only algorithmic advancements but also the incorporation of domain-specific knowledge. Integrating expert insights – such as known physical constraints, causal relationships, or established patterns – as priors within the Schrödinger Bridge could dramatically improve both the accuracy and interpretability of the generated time series, ultimately unlocking powerful new applications in forecasting, simulation, and decision-making across a wide range of scientific and industrial fields.

The pursuit of Bridge-TS, as detailed in this work, embodies a fundamentally disruptive approach to time series imputation. It isn’t merely about filling gaps; it’s about constructing a plausible trajectory, a generative path from known data points. This aligns perfectly with Turing’s assertion: “Sometimes people who are unhappy tend to look at the world as if there is something wrong with it.” The framework doesn’t accept the limitations of incomplete data as a given; instead, it actively challenges those constraints through the imposition of informative priors and a novel generative process. Each refinement of the Schrödinger Bridge, each adjustment to the compositional priors, is a testament to the notion that the best hack is understanding why something worked-or, in this case, why an imputation failed, and subsequently, how to correct it. Every patch is a philosophical confession of imperfection.

Beyond the Bridge

The framework presented here, while demonstrating improved imputation, inevitably reveals the comfortable constraints of current generative modeling. The Schrödinger Bridge, elegant as it is, still relies on a defined trajectory – a pre-ordained path, even if probabilistically informed. Future work should challenge this linearity, exploring generative processes that fundamentally resist simple mapping between observed and imputed data. A truly robust system might deliberately introduce controlled distortions, forcing the model to reconstruct not just missing values, but the underlying generative rules of the time series itself.

Furthermore, the emphasis on compositional priors, while effective, begs the question of whether these priors are truly representative of the systems being modeled, or merely convenient approximations. The pursuit of ‘informative’ priors risks becoming a self-fulfilling prophecy, reinforcing existing biases within the data. A more radical approach might involve learning the prior from unlabeled time series data – allowing the model to discover patterns independent of the imputation task itself.

Ultimately, the goal isn’t simply to fill gaps, but to reverse-engineer the processes that create time series. The limitations of Bridge-TS aren’t failures, but invitations. They point towards a future where imputation isn’t about reconstruction, but about intelligent, generative exploration of the possible.

Original article: https://arxiv.org/pdf/2512.23832.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/