Hybrid Model in R

Anzar Draboo
4 min readJan 16, 2020

This is high level overview of the Hybrid Model R package. I assume that the readers have prior knowledge of the fundamental statistical techniques. Some of the content is taken from the R documentation and i have tried to make it a bit easy to understand. This package in R provides functions to build composite models using individual models from the ‘forecast’ package. The six component models include auto.arima, ets, thetam, nnetar, stlm and tbats. The main workhorse function of this package is the HybridModel() . The individual component models are stored inside the hybridModel objects and can be viewed separately in their respective slots. All the regular methods from the ‘forecast’ package could be applied to these individual component models. Characteristics of the input series can cause problems for certain types of models and parameters. For example, stlm models require that the input series be seasonal; furthermore, the data must include at least two seasons of data for the decomposition to succeed. If this is not the case, hybridModel() will remove the stlm model so an error does not occur. The ets model does not handle a series well with a seasonal period longer than 24 and will ignore the seasonality. In this case, hybridModel() will also drop the ets model from the ensemble.

The six individual models are explained here

AutoARIMA (auto.arima)

Although ARIMA is a very powerful model for forecasting time series data, the data preparation and parameter tuning processes end up being really time consuming. Before implementing ARIMA, we need to make the series stationary, and determine the values of p and q using the plots we discussed above. Auto ARIMA makes this task simple for us as it eliminates steps of estimating the orders of Autoregressive and Moving average components. Auto ARIMA takes into account the AIC and BIC values generated to determine the best combination of parameters. AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) values are estimators to compare models. The lower these values, the better is the model.

Exponential smoothing (ets)

Exponential smoothing is a time series forecasting method for univariate data. Time series methods like the Box-Jenkins ARIMA family of methods develop a model where the prediction is a weighted linear sum of recent past observations or lags. Exponential smoothing forecasting methods are similar in that a prediction is a weighted sum of past observations, but the model explicitly uses an exponentially decreasing weight for past observations. Specifically, past observations are weighted with a geometrically decreasing ratio.

The Theta Model (thetam)

The Theta model is a times series forecasting model derived from the idea that: “An extrapolative method is practically incapable of capturing efficiently all the available information hidden in a time series”. On the one hand there are models that are too simple to catch all the available information. On the other hand there are methods with more parameters employed in order to cope with more demanding underlying patterns; unfortunately, while optimizing all these parameters usually these complex methods end up actually over-fitting the actual data. So, this approach aims to help the models capture the data. This is achieved by breaking the data down into several simpler series, each one of which captures part of the information included in the original series. Thus, In essence, a decomposition approach is employed. As a result of this process simpler models can adapt to these simpler series. For example, Instead of trying to adapt Holt Exponential Smoothing to an initial set of data we could alternatively create two series — one that captures the short term information and one that captures the long or the medium trend. Then we could try fit Naïve, Simple Exponential Smoothing.

Neural Network Time Series Forecasts (nnetar)

The nnetar function in the forecast package for R fits a neural network model to a time series with lagged values of the time series as inputs (and possibly some other exogenous inputs). So it is a nonlinear autogressive model, and it is not possible to analytically derive prediction intervals. The network is trained for one-step forecasting. Multi-step forecasts are computed recursively. For non-seasonal data, the fitted model is denoted as an NNAR(p,k) model, where k is the number of hidden nodes. This is analogous to an AR(p) model but with nonlinear functions. For seasonal data, the fitted model is called an NNAR(p,P,k)[m] model, which is analogous to an ARIMA(p,0,0)(P,0,0)[m] model but with nonlinear functions.

stlm

stlm takes a time series y, applies an STL decomposition, and models the seasonally adjusted data using the model passed as modelfunction or specified using method. It returns an object that includes the original STL decomposition and a time series model fitted to the seasonally adjusted data. If we look at the the default arguments, modelfunction=NULL and method=”ets”. So it uses an ETS model on the seasonally-adjusted data. This object can be passed to the forecast.stlm for forecasting. The forecast.stlm forecasts the seasonally adjusted data, then re-seasonalizes the results by adding back the last year of the estimated seasonal component.

TBATS

TBATS model is Trigonometric Seasonal + Exponential Smoothing Method + Box-Cox Transformation + ARMA model for residuals. The Box-Cox Transformation here is for dealing with non-linear data and ARMA model for residuals can de-correlated the time series data. TBATS model can improve the prediction performance compared to the simple Sate Space Model. The trigonometric expression of seasonality terms can not only dramatically reduce the parameters of model when the frequencies of seasonality are high but also give the model more flexibility to deal with complex seasonality.

--

--

Anzar Draboo

Data Scientist at Mahindra Rise | M.Tech (Artificial Intelligence) | B.E (Mechanical Engineering) | LifeLong Learner | Academician | Freelance Educator