For the next part of the project, I analyzed the full data set of hate crimes, not differentiating on bias. My purpose here was to look at Exponential Smoothing models and ETS models, Holt and Holt Winters as possible prediction models for the hate crimes. ETS models are a family of models that use a level (smoothing), linear trend and seasonality to predict future data. They are often used with financial data. There is Simple Exponential Smoothing model, which only includes the level (smoothing). The Holt model includes level (smoothing) and trend components. The Holt Winters model uses all three: level, trend and seasonality. I had noticed in my previous analysis that there tended to be a seasonality, which hate crime peaking in September and October time frame and then dropping off through the winter months. This was confirmed looking at the plot by month and the seasonal decomposition. Both suggest the presence of monthly seasonality.

If you look above, you can see that the seasonality peaks in the September/ October timeframe with the lowest numbers in December.

Since Holt Winters includes all three, I checked this model. I also tried the Holt model out of curiosity and practice. I do want to acknowledge that I have been working on a class by “The Lazy Programmer” on Udemy. The class is Time Series Analysis, Forecasting and Machine Learning. If you are interested in time series, it is a fabulous class.
The next issue was regarding the heteroskedasticity where the variance of data is not constant. Certain time periods will have different variance than others. Look below at the 6-month rolling std dev so the variability can be viewed in the count units rather than counts squared.

To offset this, I also tried transforming the data with the log function. I created two Holt-Winters models, one using the log transformed counts and use_boxcox = False since I was transforming the data myself. Then the second model I passed the actual counts and set use_boxcox=0, which per the class I reference above, is between the log and sqrt. These two models yielded similar results. I will say that setting the use_boxcox=False and passing the regular counts showed wildly different results. Passing the actual counts to the Holt Winters model and setting use_boxcox=0 yielded the greatest accuracy and explained the greatest percentage of variation when compared to either the Holt model or the naive forecast.
I trained the model on the data up to and including 2021, and left the 2022 data as the test data, which the model had never seen. I used the forecast method for out of sample predictions.

See the monthly totals for each year below. There is a distinct seasonal pattern. The Holt model, orange line, creates a prediction based on the smoothing level and linear trend, which is not much better from the naive prediction, the last value from the training data. With the Holt Winters model, we see that the model is able to predict the seasonal pattern.
You can see above that the mean absolute percentage error for the naive prediction is18.5%, as compared to 4.4% for the Holt Winters model.

If you look below, you can see that the r^2 is .768 meaning that this model explains approximately 77% of the variation. The naive r^2 is a negative number meaning it is worse than simply using the average.

See below the accuracy and r^2 values for the model with the log transformation. The r^2 is .70 as opposed to .77 and the rmse is 69.2 as compared to 60.6. Thus, simply passing the training[‘count’] data with an additive trend and seasonality plus the use_boxcox=0 and the ‘legacy-heuristic’ initialization method is the winner.
