The students would need to obtain historical data on the variable they are trying to forecast. This could be things like past monthly or quarterly sales figures, stock prices, weather data, or other time series data. They would split the historical data into two parts – a training set and a testing set.
The training set would contain the earliest data and would be used to develop and train each of the forecasting models. Common models students may consider include simple exponential smoothing, Holt’s linear trend method, Brown’s exponential smoothing approach, ARIMA (autoregressive integrated moving average) models, and regression models with lagged predictor variables. For each model, the students would select the optimal parameters like the alpha level in simple exponential smoothing or the p, d, q parameters in ARIMA.
Once the models have been developed on the training set, the students would then forecast future periods using each model but only using the information available up to the end of the training set. These forecasts would be compared to the actual data in the testing set to evaluate accuracy. Some common metrics that could be used include:
Mean Absolute Percentage Error (MAPE) – This calculates the average of the percentage errors between each forecast and the actual value. It provides an easy to understand measure of accuracy with a lower score indicating better forecasts.
Mean Absolute Deviation (MAD) – Similar to MAPE but without calculating the percentage, instead just looking at the average of the absolute errors.
Mean Squared Error (MSE) – Errors are squared before averaging so larger errors are weighted more heavily than small errors. This focuses evaluation on avoiding large forecast misses even if some smaller errors occur. MSE needs to be interpreted carefully as the scale is not as intuitive as MAPE or MAD.
Mean Absolute Scaled Error (MASE) – Accounts for the difficulty of the time series by comparing forecast errors to a naive “random walk” forecast. A MASE below 1 indicates the model is better than the naive forecast.
The students would calculate accuracy metrics like MAPE, MAD, MSE, and MASE for each model over the test period forecasts. They may also produce graphs to visually compare the actual values to each model’s forecasts to assess accuracy over time. Performance could also be evaluated at different forecast horizons like 1-period ahead, 3-period ahead, 6-period ahead forecasts to see if accuracy degrades smoothly or if some models hold up better farther into the future.
Additional analysis may include conducting Diebold-Mariano tests to statistically compare model accuracy and determine if differences in the error metrics between pairs of models are statistically significant or could be due to chance. They could also perform residual diagnostics on the forecast errors to check if any patterns remain that could be exploited to potentially develop an even more accurate model.
After comprehensively evaluating accuracy over the test set using multiple error metrics and statistical comparisons, the students would identify which forecasting model or models provided the most accurate and reliable forecasts based on the historical data available. No single metric alone would determine the best model, but rather the preponderance of evidence across the board in terms of MAPE, MAD, MSE, MASE, visual forecasts, statistical tests, and residual analysis.
The students would report their analysis, including details on developing each model type, describing the accuracy metrics calculated, presenting the results visually through tables and graphs, discussing their statistical findings, and making a conclusion on the most accurate model indicated by this thorough ex-post evaluation process. This would provide them significant insight into forecasting, model selection, and evaluation that they could apply in practice when working with real time-series data challenges.
While accuracy alone cannot guarantee a model’s future performance, this process allows the students to rigorously benchmark the performance of alternative techniques on historical data. It not only identifies the empirical ex-post leader, but also highlights how much more accurate or less accurate other methods were so they can better understand the practical value and predictive limitations of different approaches. This in-depth workflow conveys the types of analysis real-world data scientists and business analysts would carry out to select the optimal forecasting technique.