![]() |
| Real GDP Growth from 1947 to present (source: FRED, http://fred.stlouis.org) |
How would a specialized machine learning algorithm perform at predicting this quarter's real GDP growth, given the last 20 quarters' real GDP growth/decrease percentages as inputs?
Actually, there are already techniques for modeling and predicting time series based on past values of that time series (called "auto-regressive" time series), like the ARIMA discussed in the previous post. ARIMA uses a linear combination of past values of a time series plus a linear combination of past errors of that time series (i.e. estimated minus actuals for the fitted portion) to predict future values.
But I wanted to try a machine learning approach - specifically neural networks, because of their flexibility and adaptability in supervised learning setting.
What is a neural network? (High-level overview)
It's a machine learning algorithm that's inspired by the workings of the biological brain. You have a number of inputs, which you feed to the learning algorithm. You give the algorithm a set of training data, i.e. numerous observations of inputs alongside the dependent variable output, or "Y" value that you wish to predict.
The inputs feed into hidden "neurons". You can choose as many or as few neurons as you wish. You can also change the number of hidden neuron "layers". Each neuron is "activated" or not activated according to a logistic growth function, also called a sigmoid curve, which can take on values between 0 and 1, depending on the input that is fed to that neuron. Essentially, the neural network you build is a mesh of either activated or unactivated neurons that together have the potential to form a complex logical calculus that ultimately leads to an output variable prediction. That is, you feed the neural network A, B, and C, for example. This activates neurons D and F, but not E or G. Neurons D and F then activate J and K, which in turn influence the prediction that is output from the neural networks.
The algorithm "learns" by adapting the weights assigned to each of the connections between neurons. These weighted connections determine, for example, to what extent input A plays a role in activating neuron D, for example. A different weight in the same matrix determines how neuron D, if activated, will influence the final output. So the "learning" process happens by the algorithm optimizing the weights on all neural connections in such a way that error (or "cost", to use precise machine learning terms) will be minimized.
Here is a pictorial representation of an actual neural network implemented in the R interface. It has one input layer with 5 inputs, two hidden layers with 3 neurons each, and an output layer with one variable.
Model Setup:
Now that all that introductory business is out of the way, here's how I actually set up the model for the experiment.
I first gathered the necessary time series data from FRED, i.e. the Federal Reserve Bank of St. Louis website, for Real GDP Growth by Quarter from 1947 Q2 to 2017 Q2. I set up a data frame in R containing one row for each of the quarters between Q2 of 1952 all the way to Q2 of 2017 (starting in 1952 Q2, because this is the first quarter that has 20 full preceding quarters of data to be used as dependent variables). For each quarter's row, I included the lagged GDP growth values for up to 20 quarters - for example, in the row for Q2 2017, I included the following variables:
V1 = Q4 2016 growth/decrease
V2 = Q3 2016 growth/decrease
V3 = Q2 2016 growth/decrease
...
V20 = 2012 Q2 growth/decrease.
So the neural network would include 20 inputs for each training example - one value for each the preceding 20 quarters for GDP growth or decrease. Then, of course, the dependent variable is the actual growth or decrease for that particular quarter.
Choosing Neural Network Structure by Cross-Validation
How many layers, and how many neurons in each layer would cause the algorithm to do the best possible job at learning the pattern in the data and making predictions about the GDP growth or decrease in a given quarters? To be sure, more neurons and layers means greater flexibility in fitting to the training data, and less neurons would mean less flexibility and more bias. But too much flexibility would mean too high variance, or in other words, overfitting to the training data.
So I split the training data of 261 quarters into a training data set of 156 quarters (1952 - 1991), a cross validation data set of 54 quarters (1991 - 2004), and a testing data set 51 quarters (2005 - 2017). The cross validation data set would be used exclusively for choosing the best neural network architecture among a few options.
Here are the models I tested on the CV data set, after training them on the data set from 1952 - 1991, and their performances (measured in terms of MSE, or mean-squared error).
From the above, choosing Model D, with two hidden neuron layers of 5 neurons each, appears to perform the best on an independent cross-validation data set.
The performance by model measured above is actually measuring the by-quarter predictive value of each model, because each quarter is using the prior 20 quarter actuals for predicting the current quarter's GDP change. At no point, does the model incorporate its own predicted values as autoregressive terms for predicting the current quarter's value. So the measure of accuracy above tells us how we can expect the model to perform if, for a given quarter, we have the last 20 actuals for GDP change, and we want to know just this quarter's estimated GDP change.
Performance of Selected Model on Test Data Set (2005 - 2017)
The MSE of the quarter-by-quarter predictions produced by Model D for the 51 quarters in the test data set is 0.4478, which is an error that is quite a bit higher than seen in the cross-validation set. Most of this error manifests in the form of a number of exaggerated predictions for positive growth (see graph below).
Below is a graph of the 51 predicted quarters (blue) versus actuals (black). Note that these are scaled and normalized values for real GDP growth, not actual % growth.
Conclusion
In general, the shapes of the two time series for the test data set period do look similar. But there is a bothersome trend: often, the model does not predict large upward or downward spike in GDP change until a quarter or two after it actually occurred. This is understandable, because once a sharp drop or jump in GDP is one or two quarters in the past, it is included as one of the autoregressive terms in the neural network prediction, and therefore it shows up in future predictions.
In general, this is a lagged time series prediction; more often than not, large swings will not be predicted until its too late. And naturally, that's a limitation of this sort of model. If you ask an expert what the next quarter's GDP growth or decrease will be, based on what happened in the last 20 quarters, you might get some answer that could be considered a reasonable maximum likelihood estimate, or the "expected value". If we have already just entered a period of contraction (recession), he might say, "Oh, I expect next quarter's results to be rather disappointing." That certainly didn't require a genius. And that's essentially what the model is doing for us.
ARIMA time series models include a moving average term, i.e. one that is auto-regressive on past errors. That could possible help this model make better predictions this quarter. Otherwise, we might try using more inputs, i.e. not just past quarter's actual values, but other economic inputs which could give the model and machine learning algorithm more insight into what factors might cause the "perfect storm" that leads to a financial crisis or an economic boom.
The neural network figures out patterns in the data that we can't piece together ourselves. But the algorithm is only as smart or as dumb as the quality of the information we feed it.


