03. How It Works
Decomposition
Most forecasting methods begin by decomposing a series into components:
- Trend: The long-term direction (upward, downward, or flat).
- Seasonality: Repeating patterns at fixed periods (weekly, monthly, yearly).
- Noise (residual): Random variation unexplained by trend or seasonality.
Additive decomposition assumes the components sum: y(t) = trend + seasonality + noise. Multiplicative decomposition assumes they multiply: y(t) = trend x seasonality x noise. Multiplicative is appropriate when the amplitude of seasonal swings grows with the trend level.
Classical statistical methods
ARIMA (Autoregressive Integrated Moving Average):
ARIMA models a series as a linear combination of its own past values (AR component), past forecast errors (MA component), and a differencing operation that removes trend to achieve stationarity (the I component). SARIMA extends this to model seasonal patterns. ARIMA requires the analyst to specify three parameters (p, d, q) for the AR, differencing, and MA orders. It works well on univariate, stationary or differenced-stationary series.
Exponential smoothing (ETS):
Assigns exponentially decreasing weights to past observations, so recent values matter more than older ones. Holt's method extends it to handle trends; Holt-Winters further adds seasonality. ETS is fast, interpretable, and competitive on many short to medium horizon tasks.
Prophet:
Developed by Facebook (Meta) and open-sourced in 2017. Prophet decomposes a series into trend, seasonality (modeled with Fourier series), and holiday effects, then fits the decomposition with an additive model. It is robust to missing data and handles multiple seasonality levels. Prophet was designed for business analysts without deep time series expertise and became widely adopted for retail and capacity planning.
Machine learning approaches
Gradient boosting (XGBoost, LightGBM):
Tabular ML models can forecast by constructing lag features (y at t-1, t-7, t-365), rolling statistics (7-day average), and calendar features (day of week, month). These models won the M5 competition, a large-scale retail demand forecasting benchmark, demonstrating that well-engineered features plus gradient boosting can outperform specialized time series models on many real-world tasks.
LSTMs (Long Short-Term Memory):
Recurrent neural networks that maintain a hidden state across time steps, allowing them to capture long-range dependencies that ARIMA misses. LSTMs were dominant in deep learning forecasting roughly from 2016 to 2021. Amazon's DeepAR model, a probabilistic LSTM trained across many related time series simultaneously, became a widely used production system for demand forecasting.
Transformer-based models:
Since 2022, attention-based architectures (PatchTST, iTransformer) have challenged LSTMs on long-horizon benchmarks. The key innovation is treating fixed-size patches of a time series as tokens, similar to how vision transformers treat image patches, enabling efficient long-context modeling.
Time series foundation models
The most significant recent development is the emergence of foundation models pre-trained on massive time series corpora for zero-shot use.
TimesFM (Google Research, 2024):
A 200M parameter decoder-only transformer pre-trained on 100 billion real-world time points, primarily from Google Trends and Wikipedia pageviews. Published at ICML 2024. TimesFM achieves zero-shot performance competitive with or exceeding supervised models explicitly trained on each target dataset. Google's blog describes the architecture: "similar to LLMs, we use stacked transformer layers as the main building blocks. In the context of time series forecasting, we treat a patch (a group of contiguous time-points) as a token." Available on Hugging Face.
Chronos (Amazon, 2024):
A family of language model architectures (T5-based) pre-trained on a large collection of real-world time series, augmented with synthetic data. Chronos tokenizes time series values into discrete bins, framing forecasting as a sequence-to-sequence language modeling task. The model achieves strong zero-shot performance across diverse domains.
Moirai (Salesforce Research, 2024):
A masked encoder-based transformer trained on LOTSA (Large-scale Open Time Series Archive), containing over 27 billion observations across nine domains. Moirai addresses the challenges of cross-frequency learning, variable numbers of covariates, and distributional shift across datasets. Published at ICML 2024. "Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models," per the arXiv abstract.
The pattern across all three mirrors the LLM paradigm: pre-train once on massive data, then evaluate or fine-tune on specific downstream tasks with minimal data.