METEO 469
From Meteorology to Mitigation: Understanding Global Warming

Multivariate Regression Demonstration

PrintPrint
Sample illustration of linear regression tool.
Figure 3.23: Regression tool.
Credit: Michael Mann

We will now investigate the multivariate generalization of ordinary linear regression, using a data set of Northern Hemisphere land temperature data over the past century. We will attempt to statistically model the observed data in terms of a set of three predictors: (1) estimates from a simple climate model (discussed in our next lesson) known as an Energy Balance Model that has been driven by estimated historical anthropogenic (greenhouse gas and aerosol) and natural (volcanic and solar) radiative forcing histories, and two internal climate phenomena discussed in the previous subsection: the (2) DJF Niño3.4 index, measuring the influence of the El Niño phenomenon, and the (3) DJFM NAO index.

The demonstration is in 4 parts below:

Part 1
Click for transcript.

PRESENTER: OK, we're going to look at an example of multivariate regression. And I'm going to read in a dataset here. This is a dataset that contains the average temperature of the northern hemisphere land regions over the past century. And so let's start out by plotting out that data. So we can see what it looks like. And there it is.

And we're going to look at three potential quantities that may explain some of the variations that we see in this temperature series. So the temperature series is our dependent variable. And we're going to now look at three different independent variables. One of these independent variables is a simulation that I've done using a climate model called an energy balance model that we'll be talking about in the next lesson.

And I've subjected this theoretical climate model to both human impacts, estimated radiative forcing by greenhouse gases, and anthropogenic sulfate aerosol emissions, as well as radiative forcing by natural causes, including volcanoes, explosive volcanic eruptions, and estimated changes in solar output over time.

So let's take a look at what that simulation looks like. And what I'll do is I'll select the alternative Axis B So that both series are aligned on the same scale with the energy balance model values on the right-hand side and the temperature anomaly from the instrumental observations on the left-hand side. I'll change the scale just a little bit here.

So you can see that, in fact, the energy balance model simulation, the red curve, does capture a lot of the variation in the blue curve, the instrumental surface temperature data. But there's still quite a bit of variation that's left unexplained. And we will now look at two other factors that are internal to the climate system, rather than external in nature, that might explain some of that residual variability.

Part 2
Click for transcript.

PRESENTER: OK, before we go on, let's actually do the formal regression using only the single predictor of the energy balance model simulation as a predictor of northern hemisphere land temperatures. We'll run the regression. We see the r squared value is 0.716. That means we explain a fairly impressive just under 72% of the variation in northern hemisphere land temperatures using just the result of that model simulation. If you look at the value of rho, the lag 1 auto correlation coefficient, 0.057, that tells us that autocorrelation of the residuals doesn't appear to be a problem.

So let's go back to the plot. And now, we're going to plot the regression result model output. We'll convert that to a line plot. And you can see that it does provide, as we saw before, a fairly good fit to the data. It explains just under 72% of the variation in the data. And if we look at the residuals from that regression-- Model Residuals-- they look pretty random. There doesn't seem to be a whole lot of structure.

Although, there is quite a bit of interannual variability. And perhaps we can explain some of that interannual variability through two other predictors, the El Nino phenomenon, and the North Atlantic Oscillation phenomenon, two internal climate modes that influence northern hemisphere land temperatures. So let's take a look at that next.

Part 3
Click for transcript.

PRESENTER: OK, so let's look at the other factors that might potentially explain some of the variation in this temperature series. First of all, we'll take a look at the Nino 3.4 index. That, as we know, is a measure of El Nino variability. I'll choose Axis B here as an option to put these on roughly the same scale.

And so as we can see, there is a positive correlation between El Nino and northern hemisphere average temperatures. One of the more obvious examples is the 1997-'98 El Nino, one of the largest El Nino event on record. That was also associated with an unusually warm year. But we see other evidence of relationships between El Ninos leading to warmer land air temperatures in a given year and the opposite, La Nina, leading to relatively cold temperatures.

So we see the El Nino index, this red curve, dips down to this extreme negative value. That was an unusually cold year, somewhere around 1920 or so. So we might expect that there's a positive relationship between these two series, in that we can explain some of the variation in the northern hemisphere temperature series with El Nino.

Let's look finally now at the NAO index. And again, we can see a positive relationship. Unusually warm years, in some cases, appear to be associated with the positive phase of the NAO index. So we might expect that El Nino and the NAO can explain some of the remaining variation that our energy balance model simulation didn't explain. And our next step will be to try all three factors at once.

Part 4
Click for transcript.

PRESENTER: OK, so now let's try our multivariate regression using all three predictors-- the energy balance model simulation, El Nino, and the NAO. So we go to Regression Model. And with the right click of the mouse here, we can select all three quantities at once. So I've got the EBM simulation, El Nino, and the NAO. And we're trying to predict temperature.

We run the regression. And we can see that we now explain nearly 80% of the variation in the temperature series. We went from just under 72% to now essentially 80% of the variation using those three predictors. And that's about as good as you can expect to do in a simple multivariate regression of this sort to explain 4/5 of the total variation in the data. We can see that the autocorrelation coefficient is small. It's not going to be statistically significant. We don't have to worry about autocorrelation of the residuals, which is nice.

So let's now go back to the plot settings. And we're going to plot our model simulation result, our multivariate regression result, that is, which includes the energy balance simulation, El Nino, and NAO, those two internal factors. Scroll down to Model Output. And there you can see it.

The red curve is our statistical model based on the three predictors that we've used. The blue curve is the actual temperature series. And we've explained a fairly impressive amount of variation in the data. We can see the effect of volcanic eruptions and some of the short-term coolings that are seen in the record. And then a lot of the other inter-annual fluctuations are at least partly explained by the NAO and El Nino.

If we like, we can recover the regression coefficients in our multivariate regression the constant term, the term multiplying the energy balance model simulation, the term multiplying the El Nino series, and the term multiplying the NAO series. And that sum of terms is our statistical model. And it does quite well in this particular case.

Finally, we can take a look at the residuals, what's left over that wasn't explained by our multivariate regression. And that's what's shown with the green curve. There's some variability, of course, that's leftover, that isn't explained by the factors we've considered. But there isn't a whole lot of structure in that time series, suggesting that the results of this multivariate regression are probably meaningful and are telling us something about the underlying factors that explain long-term variations, and year-to-year year variations, and decadal variations in northern hemisphere land temperatures over time.

You can play around with the data sets used in this example yourself using the Linear Regression Tool.