Explainable AI in Demand Forecasting: How to Understand What the Model Sees
- Yvonne Badulescu
- 12 minutes ago
- 7 min read
More and more businesses are using integrated AI and machine learning (ML) models within their ERP systems to forecast product demand but many struggle to explain these forecasts to management and colleagues. Why? Because AI and ML models are complex and operate in high-dimensional spaces, which are harder to understand than traditional models based on simpler, linear relationships. Even though these models have demonstrated that they can improve forecast accuracy, they are difficult to interpret.
Black-Box Forecasting
The term "black box" is used when the logic behind a model’s predictions is hidden. It has been discussed in systems theory for decades and is now central in AI research, particularly for complex models where internal logic cannot be easily described. In this article, AI refers to deep learning models like neural networks (e.g. LSTM) and ensemble methods (e.g. XGBoost). It does not refer to generative AI tools like ChatGPT or Gemini.
Black box algorithms often show strong performance. But because we cannot see how they arrive at predictions, they are difficult to trust.
The black box problem comes from the fact that we understand the inputs to these models. We know we are feeding in historical sales, prices, promotions, weather, and other external data. But we do not see how the model processes this data to produce its forecast. It is easy to trust the AI when forecasts are correct. But when it makes a mistake, confidence drops quickly. This is known as algorithm aversion (If you'd like to read more about this phenomenon, please read my article here).
What Does It Mean to "Explain" a Forecast?
Imagine you are baking a cake and each of the ingredients represents a variable in your model. You measure and add each one carefully, whether it's flour, sugar, eggs, or butter, using defined units such as grams, milliliters, or cups. These units act as the coefficients in a model. Each ingredient has a clear role. Reducing the sugar will make the cake less sweet, increasing the baking powder will change the texture, and adding too much liquid will affect the structure. You understand exactly how each input affects the outcome, and there is no guesswork involved.
In the context of linear forecasting models such as exponential smoothing, ARIMA, or logistic regression, the situation is very similar to baking the cake. Each variable has an assigned coefficient that allows you to interpret precisely how changes in that variable (ingredient) will influence the forecast (cake). These models are transparent and quantifiable. The relationships are clear, and it is easy to communicate how the forecast was constructed and what each part contributed to the result.
The same is not true for AI models. Techniques such as neural networks, tree ensembles, and support vector machines are designed to model complex, non-linear relationships and interactions between variables. They do not treat inputs independently, and the effect of any one feature depends on how it interacts with others. The internal structure consists of multiple layers and transformations that are not directly interpretable. To continue the baking analogy, imagine collecting thousands of finished cakes, feeding them into a machine, and asking it to generate a new cake based on the patterns it detects. The machine may produce a result that looks good and tastes right, but you have no idea which ingredients were used, in what quantities, or why those specific choices were made. You are presented with the outcome, but the process behind it is hidden. This is the black box nature of AI models. They can produce highly accurate forecasts, but they do so without offering a transparent explanation of how those forecasts are derived. The logic remains obscured, making it difficult to justify a forecast to others or to understand the underlying cause when something unexpected occurs.
This is precisely where explainability becomes essential. When you cannot observe or trace how a model arrived at its forecast, it becomes harder to trust, refine, or defend that output in a business context.
Methods for Explaining AI Forecasts
Explainable AI methods provide tools that estimate the influence of different features, quantify their importance, and show how these contributions shift over time. These methods help turn opaque forecasts into interpretable insights, support better decision-making, and enable clear communication across teams.
The purpose is not only to explain the forecasts but to also interpret the deep insights these explanations provide as well as to facilitate trust in our AI systems over time.
Research on explainability in AI has grown in recent years. Here is a summary of the main categories used to interpret AI forecasts:
1. Feature Attribution
Used to determine how much of each feature contributes to a specific prediction made by the AI/ML model. The state of the art includes the following methods:
SHAP (SHapley Additive exPlanations) [1]: Uses game theory to calculate how much each feature contributed to a prediction. This approach works for many types of models and can explain individual forecasts and overall trends, however it requires significant computing power.
LIME (Local Interpretable Model-agnostic Explanations) [2]: Builds a simple model around one prediction to explains what influenced that forecast locally (for one datapoint or instant). It is a quick method and can be applied to any type of AI model however can only explain one specific point rather than the global forecast.
Feature importance [3]: Calculated in tree-based models by tracking how often and how effectively each feature is used to split the data, with scores based on the total reduction in impurity such as Gini or entropy across all trees. It is quick to compute and highlights which variables the model relies on most overall, but it does not provide insight into individual forecasts and can be biased toward features with many unique values.
2. Model Simplification
Model simplification involves using a simpler, interpretable model such as a decision tree or linear model, to approximate the behavior of a more complex model in order to make the underlying logic easier to understand and communicate. These include:
Surrogate models [3]: Simple models like decision trees are trained to replicate the behavior of a more complex model but may not capture all the logic of the original model. Then feature attribution approaches can be applied to determine which are the most important variables.
Rule extraction [4]: This approach creates if-then rules from trained models to make the logic easier to follow and works best when the data or model is small. It is often applied to neural networks to produce simplified rule-based versions that approximate how the model makes decisions, making its logic easier for human interpretation.
3. Visualization Tools
Visualization tools help interpret complex models by showing how changes in input features influence predictions. These methods are especially useful for identifying trends, comparing feature effects across instances, and supporting communication with non-technical stakeholders.
Partial Dependence Plots (PDPs) [4]: Show how changing one feature affects model predictions on average, holding all other features constant (e.g., forecasted weekly sales versus product price). They are easy to read and helpful for identifying general trends, but they can be misleading when features are correlated, since the assumption of independence does not always hold in real data.
Individual Conditional Expectation (ICE) plots [4]: Show how a single feature influences predictions for individual examples, revealing variation that might be hidden in aggregated plots (such as PDPs). ICE plots are useful for highlighting subgroups where the model behaves differently. For example, an ICE plot might show that while most customers reduce purchases as prices increase, a smaller segment maintains stable demand.
How Does This Apply in Practice?
Understanding the forecast yourself is only part of the challenge; presenting it to others, especially when the drivers behind it change each month, is where things become more difficult. Demand shifts due to seasonality, promotions, social trends, and broader economic conditions, which means that the factors influencing the model’s predictions will also shift over time, which is the reality of real-world forecasting. These changes might be driven by:
Changing data distributions such as customer behavior, seasonality, promotions, or macroeconomic factors shift.
Temporal sensitivity where the features (variables) matter at various times in the week or the month.
Effects of retraining the models on updated data which may recalibrate the importance of each feature.
Interaction effects between features which changes their individual importances.
Consider a situation where you are forecasting demand for mid-priced home appliances.
In September, digital marketing spend might be the primary driver of sales.
By November, in-store discounts associated with pre-holiday promotions could take over.
In January, with delivery delays and depleted stock, the most influential factor becomes availability and lead time.
Instead of re-explaining the full logic of the model every month, it’s more effective to track how feature importance changes over time, group related features into broader categories to simplify the explanation, and combine detailed inputs (such as postcode-level sales into higher-level units such as cities or regions, for example). This makes the results easier to understand and helps you spot when shifts in the model’s focus reflect real changes in customer behavior or business conditions.
You can use dashboards that display SHAP values, feature importance rankings, or simple explanations to track what the model is prioritizing each month. This helps you spot patterns or unexpected changes and makes it easier to explain the forecast to colleagues or managers who are not involved in building the model. If the forecast suddenly increases or drops, feature attribution methods can show whether the cause came from internal factors like past sales or from external ones like weather or online activity.
For further reading, please refer to these excellent academic references below:
[1] Lundberg, Scott M., and Su-In Lee. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
[2] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “'Why Should I Trust You?': Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. New York: Association for Computing Machinery, 2016. https://doi.org/10.1145/2939672.2939778.
[3] Barredo Arrieta, Alejandro, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, et al. “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.” Information Fusion 58 (2020): 82–115. https://doi.org/10.1016/j.inffus.2019.12.012.
[4] Adadi, Amina, and Mohammed Berrada. “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI).” IEEE Access 6 (2018): 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052.




Comments