Predict future AI visibility scores using a Holt-Winters ensemble with bootstrap-calibrated prediction intervals, cross-validated diagnostics, and confidence quality ratings.

Overview

Visibility Forecasting projects your AI visibility score forward in time using a statistically rigorous time-series ensemble. The engine trains three competing models on up to 180 days of history, weights them by out-of-sample accuracy, and returns forecasts at 30, 60, or 90 days with honest, empirically-calibrated confidence bands.

Every forecast ships with diagnostic metrics so you can trust (or distrust) the numbers: cross-validated error, coverage probability, residual autocorrelation tests, and an overall confidence quality rating of High, Medium, or Low.

How It Works

Data Collection and Cleaning

The system pulls up to 180 days of historical visibility scores, aggregated to daily averages. Before fitting, two preprocessing steps protect the models from bad data:

IQR-based outlier detection + Winsorization â€” Extreme values are clamped to the 1.5Ã—IQR fence rather than discarded. This preserves sample size while preventing a single bad day from distorting the trend.
Gap-filling via linear interpolation â€” Missing days are filled so the models see an evenly-spaced series. The diagnostics report how many gaps were filled.

A minimum of 14 data points is required for trend-based forecasting. Seasonal models additionally require 21 days (3 full weeks) before they are considered.

The Three Models

The engine fits three competing models on every run:

Holt-Winters Additive â€” Level + trend + weekly seasonality (period = 7). Best when your visibility has a day-of-week pattern (e.g., weekday peaks).
Holt Damped Trend â€” Level + damped trend, no seasonality. Best when your data has a clear direction that should eventually flatten out, not extrapolate forever.
Seasonal NaÃ¯ve â€” Repeats last week's pattern. A strong baseline that is hard to beat on noisy data.

Each model runs a fine-grained grid search across ~200+ parameter combinations (alpha, beta, gamma, phi), optimizing AICc (corrected Akaike Information Criterion) on the full series. AICc penalizes overfitting more aggressively than raw error.

Ensemble Weighting

Each model's out-of-sample accuracy is measured using expanding-window time-series cross-validation:

Minimum training window: 21 days
CV horizon: 7 days ahead per fold

The final ensemble weights each model inversely by its CV RMSE, so the model that best predicts unseen data contributes most to the forecast. A single model can be selected if one dominates, or the ensemble can blend all three.

Bootstrap-Calibrated Prediction Intervals

Instead of assuming residuals are normally distributed (which they rarely are), the engine builds empirical prediction intervals via residual bootstrap:

Compute residuals from the winning ensemble on the training data
Draw 500 bootstrap samples with replacement
Take the 2.5th and 97.5th percentiles as the 95% lower and upper bounds

This produces honest intervals that widen naturally where the model is uncertain and narrow where it is confident, without distributional assumptions.

Diagnostics Returned

Every forecast response includes a diagnostics object:

Metric	Meaning
`cvRmse`	Cross-validated root mean squared error
`cvMae`	Cross-validated mean absolute error
`cvMape`	Cross-validated mean absolute percentage error (null if any actual is 0)
`coverageProbability`	% of CV actuals that fell inside the 95% prediction interval (target: ~95%)
`ljungBoxQ` + `ljungBoxPValue`	Ljung-Box test for residual autocorrelation (p > 0.05 = residuals look like white noise, model has extracted the signal)
`residualStdDev`	Standard deviation of residuals
`outliersDetected` / `outliersWinsorized`	How many outliers the preprocessor clamped
`gapsFilled`	How many missing days were linearly interpolated
`confidenceQuality`	Overall rating â€” high, medium, or low

The winning model name and its parameters (e.g., { alpha: 0.35, beta: 0.12, gamma: 0.20 }) are stored with the forecast so every prediction is traceable to a specific model configuration.

Confidence Quality Rating

The engine combines the diagnostics into a single signal:

High â€” Low CV error, coverage near 95%, residuals pass Ljung-Box. Trust the forecast for planning.
Medium â€” One or more diagnostics are marginal. Use the forecast directionally; treat point estimates with caution.
Low â€” High CV error, poor coverage, or residual autocorrelation. The model is struggling with your data. Don't bet the quarter on it.

Forecast Horizons

Horizon	Use Case
30 days	Short-term planning. Narrowest intervals, highest confidence.
60 days	Medium-term strategy. Intervals widen as uncertainty compounds.
90 days	Long-term outlook. Useful for trend direction and relative comparison.

Segment Filtering

Forecasts can be computed for any prompt segment:

All prompts (default) â€” Aggregate forecast across your whole portfolio.
Branded â€” How visibility is trending on queries that mention your brand.
Non-Branded â€” The most important segment â€” organic discovery trajectory.
Competitor â€” Head-to-head comparison query trajectory.

Pass the ?segment= parameter to the API or use the segment toggle on the analytics page. Historical data is re-aggregated from raw snapshots for the selected segment, so the forecast is accurate for that slice.

How to Use

Navigate to Analytics â†’ Forecasting, or use the MCP tool get_visibility_forecast.
Check the confidence quality rating first. If it's Low, ask why â€” is the history too short, too volatile, or full of gaps?
Review the winning model in the response. If Seasonal NaÃ¯ve dominates, your data doesn't have enough signal yet â€” keep capturing.
Watch coverage probability. If it's drifting far from 95%, the intervals are miscalibrated and need more data.
Compare last month's forecast to this month's actuals to build intuition for how accurate your specific project is.

Interpreting Results

High-quality ensemble with narrow bands â€” Your visibility is stable and predictable. Optimization efforts are producing consistent results.
High-quality ensemble with wide bands â€” Your visibility is genuinely volatile. The model is calibrated; the world is just noisy.
Low-quality rating â€” Don't trust the point estimate. Look at trend direction only, and capture more data.
Ljung-Box p < 0.05 â€” Residuals have autocorrelation, meaning there is signal the model hasn't captured. Often a sign your series has a change point (a platform update, a product launch) that broke the stationarity assumption.

Forecast Storage

Key forecast snapshots (at 30, 60, and 90-day marks) can be stored in the visibility_forecasts table. This lets you compare what you predicted against what happened â€” a crucial feedback loop for calibrating your own trust in the system.

The model version, winning model name, and full parameter set are all persisted, so every historical forecast is fully reproducible.

Plan Requirements

Visibility Forecasting is available on Pro-SME and above.

Visibility Forecasting

On this page