Tag Archives: forecasting

HOW DO YOU PLAN TO EVALUATE THE ACCURACY OF YOUR DEMAND FORECASTING MODEL?

To properly evaluate the accuracy of a demand forecasting model, it is important to use reliable and standard evaluation metrics, incorporate multiple time horizons into the analysis, compare the model’s forecasts to naive benchmarks, test the model on both training and holdout validation datasets, and continuously refine the model based on accuracy results over time.

Some key evaluation metrics that should be calculated include mean absolute percentage error (MAPE), mean absolute deviation (MAD), and root mean squared error (RMSE). These metrics provide a sense of the average error and deviation between the model’s forecasts and actual observed demand values. MAPE in particular gives an easy to understand error percentage. Forecast accuracy should be calculated based on multiple time horizons, such as weekly, monthly, and quarterly, to ensure the model can accurately predict demand over different forecast windows.

It is also important to compare the model’s forecast accuracy to some simple benchmark or naive models as a way to establish whether the proposed model actually outperforms simple alternatives. Common benchmarks include seasonal naïve models that forecast based on historical seasonality, or drift models that assume demand will remain flat relative to the previous period. If the proposed model does not significantly outperform these basic approaches, it may not be sophisticated enough to truly improve demand forecasts.

Model evaluation should incorporate forecasts made on both the data used to train the model, as well as newly observed holdout test datasets not involved in the training process. Comparing performance on the initial training data versus later holdout periods helps indicate whether the model has overfit to past data patterns or can generalize to new time periods. Significant degradation in holdout accuracy may suggest the need for additional training data, different model specifications, or increased regularization.

Forecast accuracy tracking should be an ongoing process as new demand data becomes available over time. Regular re-evaluation allows refinement of the model based on accuracy results, helping to continually improve performance. Key areas that could be adapted based on ongoing accuracy reviews include variables included in the model, algorithm tuning parameters, data preprocessing techniques, and overall model design.

When conducting demand forecast evaluations, other useful metrics may include analysis of directional errors to determine whether the model tends to over or under forecast on average, tracking of accuracy over time to identify degrading performance, calculation of error descriptors like skew and kurtosis, and decomposition of total error into systemic versus irregular components. Graphical analysis through forecast error plots and scatter plots against actuals is also an insightful way to visually diagnose sources of inaccuracy.

Implementing a robust forecast accuracy monitoring process as described helps ensure the proposed demand model can reliably and systematically improve prediction quality over time. Only through detailed, ongoing model evaluations using multiple standard metrics, benchmark comparisons, and refinements informed by accuracy results can the true potential of a demand forecasting approach be determined. Proper evaluation also helps facilitate continuous improvements to support high-quality decision making dependent on these forecasts. With diligent accuracy tracking and refinement, data-driven demand modelling can empower organizations through more accurate demand visibility and insightful predictive analytics.

To adequately evaluate a demand forecasting model, reliability metrics should be used to capture average error rates over multiple time horizons against both training and holdout test data. The model should consistently outperform naive benchmarks and its accuracy should be consistently tracked and improved through ongoing refinements informed by performance reviews. A thoughtful, methodical evaluation approach as outlined here is required to appropriately determine a model’s real-world forecasting capabilities and ensure continuous progress towards high prediction accuracy.

HOW WOULD THE STUDENTS EVALUATE THE ACCURACY OF THE DIFFERENT FORECASTING MODELS

The students would need to obtain historical data on the variable they are trying to forecast. This could be things like past monthly or quarterly sales figures, stock prices, weather data, or other time series data. They would split the historical data into two parts – a training set and a testing set.

The training set would contain the earliest data and would be used to develop and train each of the forecasting models. Common models students may consider include simple exponential smoothing, Holt’s linear trend method, Brown’s exponential smoothing approach, ARIMA (autoregressive integrated moving average) models, and regression models with lagged predictor variables. For each model, the students would select the optimal parameters like the alpha level in simple exponential smoothing or the p, d, q parameters in ARIMA.

Once the models have been developed on the training set, the students would then forecast future periods using each model but only using the information available up to the end of the training set. These forecasts would be compared to the actual data in the testing set to evaluate accuracy. Some common metrics that could be used include:

Mean Absolute Percentage Error (MAPE) – This calculates the average of the percentage errors between each forecast and the actual value. It provides an easy to understand measure of accuracy with a lower score indicating better forecasts.

Mean Absolute Deviation (MAD) – Similar to MAPE but without calculating the percentage, instead just looking at the average of the absolute errors.

Mean Squared Error (MSE) – Errors are squared before averaging so larger errors are weighted more heavily than small errors. This focuses evaluation on avoiding large forecast misses even if some smaller errors occur. MSE needs to be interpreted carefully as the scale is not as intuitive as MAPE or MAD.

Mean Absolute Scaled Error (MASE) – Accounts for the difficulty of the time series by comparing forecast errors to a naive “random walk” forecast. A MASE below 1 indicates the model is better than the naive forecast.

The students would calculate accuracy metrics like MAPE, MAD, MSE, and MASE for each model over the test period forecasts. They may also produce graphs to visually compare the actual values to each model’s forecasts to assess accuracy over time. Performance could also be evaluated at different forecast horizons like 1-period ahead, 3-period ahead, 6-period ahead forecasts to see if accuracy degrades smoothly or if some models hold up better farther into the future.

Additional analysis may include conducting Diebold-Mariano tests to statistically compare model accuracy and determine if differences in the error metrics between pairs of models are statistically significant or could be due to chance. They could also perform residual diagnostics on the forecast errors to check if any patterns remain that could be exploited to potentially develop an even more accurate model.

After comprehensively evaluating accuracy over the test set using multiple error metrics and statistical comparisons, the students would identify which forecasting model or models provided the most accurate and reliable forecasts based on the historical data available. No single metric alone would determine the best model, but rather the preponderance of evidence across the board in terms of MAPE, MAD, MSE, MASE, visual forecasts, statistical tests, and residual analysis.

The students would report their analysis, including details on developing each model type, describing the accuracy metrics calculated, presenting the results visually through tables and graphs, discussing their statistical findings, and making a conclusion on the most accurate model indicated by this thorough ex-post evaluation process. This would provide them significant insight into forecasting, model selection, and evaluation that they could apply in practice when working with real time-series data challenges.

While accuracy alone cannot guarantee a model’s future performance, this process allows the students to rigorously benchmark the performance of alternative techniques on historical data. It not only identifies the empirical ex-post leader, but also highlights how much more accurate or less accurate other methods were so they can better understand the practical value and predictive limitations of different approaches. This in-depth workflow conveys the types of analysis real-world data scientists and business analysts would carry out to select the optimal forecasting technique.