Glacial cycles: exogenous orbital changes vs. endogenous climate dynamics

We use a statistical model, the cointegrated vector autoregressive model, to assess the degree to which variations in Earth’s orbit and endogenous climate dynamics can be used to simulate glacial cycles during the late Quaternary (390 kyr–present). To do so, we estimate models of varying complexity and compare the accuracy of their 5 in-sample simulations. Results indicate that strong statistical associations between endogenous climate variables are not enough for statistical models to reproduce glacial cycles. Rather, changes in solar insolation associated with changes in Earth’s orbit are needed to simulate glacial cycles accurately. Also, results suggest that non-linear dynamics, threshold e ﬀ ects, and/or free oscillations may not play an overriding role in 10 glacial cycles.


Introduction
According to Paillard (2001), Adhemar (1842) is among the first to link changes in climate to variations in Earth's orbit, specifically precessional changes in the equinox. Croll (1875) formalizes this notion by postulating that seasonal changes in solar insolation, which are associated with precession, cause ice sheets to grow and shrink. This explanation is expanded to include changes in eccentricity, precession, and obliquity by Milankovitch (1941). His model hypothesizes that glacial cycles coincide with the 23 and 41 kyr periods of precession and obliquity. This hypothesis is confirmed by Hays et al. (1976) who examine the periodicity of ice sheets. Since then, the main features analysts often use statistical techniques (either ordinary least squares or spectral techniques) to link a proxy for the mechanism of interest with either temperature and/or ice volume. For example, research indicates that there is a close and seemingly stable correlation between CO 2 and temperature (Paillard, 2001;Siegenthaler et al., 2005;Martinez-Garcia et al., 2009;Luthe et al., 2008), CO 2 and ice volume (Martinez-Garcia Denton (2008) argue that insolation during the Southern Hemisphere summer is the more important driver of temperature in the Antarctic.
In this paper, we use a recently developed statistical technique, the so called cointegrated vector autoregression approach, to assess the degree to which variations in Earth's orbit and endogenous climate dynamics can be used to simulate glacial cycles 10 during the late Quaternary period (391 kyr-present). To do so, we estimate models of varying complexity and compare the accuracy of their in-sample simulations. Results indicate that strong statistical associations between endogenous climate variables that are reported in previous studies are not enough for statistical models to reproduce glacial cycles. Rather, changes in solar insolation associated with changes in Earth's 15 orbit are needed to simulate climate cycles. Also, the in-sample accuracy of simulating glacial cycles using our (essentially linear) statistical models suggests that non-linear dynamics, threshold effects, and/or free oscillations may not play an overriding role in glacial cycles.
These results and the methods used to obtain them are described in four sections. 20 Section 2 describes the data and the statistical techniques used to estimate and compare the statistical models. Results are described in Sect. 3. Section 4 discusses how these results can be used to test various hypotheses about the drivers of glacial cycles. Section 5 concludes by describing how future efforts will use this statistical approach to estimate temperature sensitivity, test hypotheses about climate dynamics, and test Program across the globe. Sea surface temperature is reconstructed using alkenones from site PS2489-2/ODP1090 in the subantarctic Atlantic. Data for sea level are reconstructed using oxygen isotope records from Red Sea sediments. Sources for these data, the number of observations, units of measure, and their original time scale are given in Table 1. 15 Indirect linkages among the six climate variables are proxied by four variables; iron (Fe), sea salt sodium (Na), non sea-salt calcium (Ca), and non sea-salt sulfate (SO 2− 4 ). Fe is derived almost entirely from terrestrial sources and is used as a proxy for a socalled iron fertilization effect that may enhance biotic uptake of CO 2 (Martin, 1990). Sulfate (SO 2− 4 ) originates mainly from marine biogenic emissions of dimethylsulphide 20 (after removing sea-salt sources using the Na data), and so is a proxy for marine biogenic emissions (Wolff et al., 2008). Sea salt sodium is derived from the sea-ice surface and proxies winter sea-ice extent (Wolff et al., 2006). Non sea-salt calcium has a terrestrial origin (mainly Patagonia) and may represent changes in temperature, moisture, vegetation, wind strength, glacial coverage, or changes in sea level in and 25 around Patagonia (Basile et al., 1997).
To make these data amenable to a statistical analysis, we convert them to a common time scale (EDC3) using conversions from Parrenin et al. (2007) Raymo (2003). These unevenly spaced observations are interpolated (linearly) to generate a data set in which each time series has a time step of 1 kyr. Solar insolation is exogenous to the statistical models and this driver is represented using several time series. Changes in Earth's orbit generate changes in solar insolation that are represented by times series for precession, obliquity, and eccentricity. These 5 changes generate spatial and temporal variations that are represented by time series for the radiation received at a specific latitude (e.g. 65 • N) on a given day (e.g. 21 June-Summer). We also represent the effect of summer-time insolation with cumulative insolation on days during which insolation exceeds a pre-defined threshold (e.g. Huybers and Denton, 2008). For example, Insol275 is a time series for cumulative annual inso-10 lation for days on which daily insolation exceeds 275 W/m 2 at a specific latitude.

The statistical model
All statistical model results are based on the cointegrated vector autoregressive (CVAR) model (Johansen, 1988;Juselius, 2006), which was developed to adequately analyze non-stationary time-series in a multivariate setting mostly for economic appli- 15 cations. As most climate data can be well approximated by (unit root) nonstationary processes (e.g. Stern and Kaufmann, 2000;Kaufmann et al., 2010), the usefulness of the CVAR model to address important questions in climate science seems potentially large. If a set of climate variables share a stochastic trend, a unique linear combination of the variables (in levels) will be stationary even though the variables themselves 20 are nonstationary. This stationary combination of nonstationary variables is said to cointegrate. By combining differenced and cointegrated data, the CVAR model specifies timeseries data as short-run variations around longer run equilibria. Longer run forces are divided into the forces that move the equilibria (the pushing forces) and forces that 25 correct deviations from equilibrium (adjusting forces).
To illustrate the intuition for the types of hypotheses that can be tested using a cointegrated VAR model, we specify Model 1. The variables Temp t , CO 2t , and Ice t are 590 modelled endogenously conditional on one exogenous variable for solar insolation, Insol0 at 65 • N. This specification is consistent with some of the simpler models discussed in the literature (e.g. Imbrie et al., 1993).
This model can be used to test the hypothesis that changes in temperature from period t − 1 to t can be explained by: 1. the change in solar insolation with impact γ 01 .
2. lagged adjustment effects as measured by the previous change in temperature, CO 2 , Ice, and solar insolation, with impacts γ 1j for the j -th lagged change in the first equation; 3. lagged adjustment effects as measured by the previous equilibrium error (i.e. the 15 cointegration relation) between temperature and CO 2 corrected for solar insolation, (Temp−b 1 CO 2 −b 2 Insol−b 01 ) t−1 , with impact α 11 , and between temperature and ice corrected for solar insolation, (Temp−b 3 Ice−b 4 Insol−b 02 ) t−1 , with impact α 12 . When {α 11 ,α 12 } < 0, temperature will equilibrium correct in the following sense: if, for example, last period's temperature was higher than its postulated Introduction 3. the use of R 2 as a measure of a model's explanatory power is generally misleading if the endogenous variable is nonstationary. By formulating the endogenous variables in changes rather than levels, the CVAR avoids this problem.
4. a correlation coefficient (defined in terms of deviations from a constant mean) is only appropriate for stationary variables. For nonstationary variables a correlation coefficient generally is meaningless and can be completely misleading (Granger and Newbold, 1974). By replacing the concept of correlation with cointegration, the CVAR model can evaluate whether variables are strongly associated or not 5 without the peril of spurious correlations.

Experimental design
The fully parameterized CVAR model is used to generate in-sample simulations for each endogenous variable. The simulations are initialized with observed values of the endogenous variables prior to the start date (391 kyr). To identify the factor(s) that 10 determines the ability of the CVAR model to simulate glacial cycles, we estimate four basic models that vary according to their set of exogenous and endogenous variables ( Table 2). Based on this design, we test several hypotheses regarding the role of exogenous changes in Earth's orbit and endogenous climate dynamics in glacial cycles. The four basic models are motivated by three basic hypotheses, which are given below: 15 Hypothesis 1. Glacial cycles are generated by changes in orbital cycles.
To test this hypothesis we specify Model 1 and Model 2 that differ with respect to the inclusion of different sets of variables for solar insolation. Both models include three endogenous variables; temperature, carbon dioxide, and ice. These variables are strongly correlated and represent important aspects of the glacial cycle. In Model 1, 20 these three endogenous variables are "pushed" by a single variable for solar insolation as measured by cumulative annual solar insolation (Insol0) at 65 • N. This variable is chosen because it is commonly used as a driver of glacial cycles (Kukla et al., 1981;Imbrie et al., 1993). Model 2 extends Model 1 by including the following solar variables: To evaluate the role of endogenous climate dynamics and examine linkages among the endogenous and exogenous variables, Model 3 and Model 4 expand the number of endogenous variables relative to Models 1 and 2 by including another seven variables; methane, sea surface temperature, sea level, iron, sea salt sodium, non sea-salt calcium, and non sea-salt sulphate. The design is similar to Model 1 and 2: Model 3 includes Insol0 as the only exogenous driver, whereas Model 4 includes the ten variables for solar insolation at 65 • N. This design makes it possible to compare forecasts 15 generated by models that vary according to the number and complexity of endogenous linkages. We test these hypotheses (and others) by comparing the accuracy of in-sample simulations generated by competing models. The ability of various models to simulate endogenous variables is assessed using the following loss function:

Hypothesis 3. Solar insolation at 65 • N is the best proxy for spatial and temporal specific effects of orbital changes in solar insolation.
where x t is the observed value of endogenous variable x for period t,x s,t , s = i ,j, is 5 the in-sample simulation for endogenous variable x simulated by model s for period t. Values of d t are used to calculate the S 2a test statistic (Lehman, 1975) as follows: , and 0 otherwise and N is the number of observations. The S 2a statistic tests the null hypothesis that models i and j are able to simulate 10 endogenous variable x with equal accuracy. Under this null hypothesis, the S 2a test statistic is asymptotically standard normal. A positive value for the S 2a statistic that exceeds the p = .05 threshold (1.96) indicates that the in-sample simulation for endogenous variable x generated by model j is closer to the observed value than the in-sample simulation for endogenous variable x generated by model i more often than 15 expected by random chance and, hence that model j generates a more accurate insample simulation of variable x than model i .

Results
For all models tested, the trace statistic strongly rejects the null hypothesis that there are no (zero) cointegrating relationships. Furthermore, for all models, the element 20 of the alpha matrix for the last equation that is associated with the last cointegrating relationship is negative. This indicates that the last cointegrating relationship loads into the last equation. Based on this result, the π matrix is assigned full rank (the number of cointegrating relationships is set equal to the number of endogenous variables). The R 2 for individual equations of the CVAR models reported in Table 3 suggest a wide disparity in the ability of Models 1-4 to account for the historical movements in the three endogenous variables. That the R 2 for the individual equations increases when the number of endogenous and exogenous variables increases from Model 1 to 2, and from Models 3 to 4, is not surprising. However, that the cause for the increased 5 explanatory power varies among the three endogenous variables may be more interesting. For temperature and CO 2 , expanding the number of endogenous variables from Model 1 to Model 3 generates a larger increase in explanatory power than expanding the number of exogenous solar variables from Model 1 to Model 2. Conversely, expanding the number of exogenous variables for solar insolation in Model 2 relative to 10 Model 1 generates a larger increase in the explanatory power of the equation for ice than the addition of endogenous variables in Model 3 relative to Model 1. The S 2a statistic (Table 4) rejects (p < 0.05) the null hypothesis for all comparisons except one (the ability of Model 1 and Model 3 to simulate ice volume cannot be distinguished statistically), which indicates that there is always one model that generates a 15 more accurate in-sample simulation than another. The in-sample accuracy of the comparisons generally is consistent with the explanatory power indicated by the R 2 for the individual equations. For example, the accuracy of the in-sample simulation generated by Model 4 is clearly superior to those generated by Models 1-3 (Table 4). Despite the general consistency between the R 2 of the statistical fit and the accuracy 20 of the in-sample simulation, there are an important exceptions. Model 2 generates a more accurate in-sample simulation than Model 3 for all three endogenous variables despite the fact that the R 2 of the temperature and CO 2 equations in Model 3 is greater than the R 2 for the same equations in Model 2. The latitudinal set of exogenous variables for solar insolation that generates the most 25 accurate simulation varies among the ten endogenous variables (Table 5). The most accurate simulation for each of the endogenous variables is chosen from among the twelve simulations based on the number of "wins" relative to "losses". Wins are defined as a comparison in which the S 2a statistic identifies the simulation as more accurate 596 measure solar insolation in the Southern Hemisphere. Models that measure solar insolation in the Northern Hemisphere generate the most accurate in-sample simulation for the variables, CH 4 , Fe, Ca, and Level. For the remaining two variables, SO 2− 4 and SST, the latitude at which solar insolation is measured seems to have no measurable effect on the model's accuracy -the "best model" is more accurate than only two other 10 simulations and is inferior to no other simulations.

Discussion
Cmpared to Model 3, the ability of Model 2 to generate an accurate in-sample simulation (herein termed skill) may be understood by comparing two measures of its performance (R 2 for individual equations and the accuracy of the in-sample simulation as indicated by the S 2a statistic). At first glance these two measures seem contradictory. As indicated by Figs. 1 and 2, Model 2 is able to generate a more accurate in-sample simulation than Model 3, whereas the statistical fit, as measured by the R 2 for the temperature and Ice equations in Model 3 is greater than those of Model 2 ( Rather, changes in solar insolation are amplified by changes in oceanic and/or atmospheric circulation, biotic activity, and/or sea ice. Model 3 proxies these changes with an expended set of endogenous variables; CH 4 , Fe, Na, Ca, SO 2− 4 , Level, and SST. These endogenous variables embody the relevant changes in solar insolation, therefore including the expanded suite of endogenous variables allows the estimation 5 of Model 3 to "fit" a larger fraction of the variation in the historical values of temperature and CO 2 than Model 2. But the historical information that is embodied in the expanded suite of endogenous variables is "lost" when these variables are simulated endogenously by Model 3 because Model 3 does not include the relevant set of variables for solar insolation. Without the requisite suite of variables for solar insolation, Model 3 cannot use the enhanced connections among endogenous variables to generate accurate in-sample simulations for temperature and CO 2 . Conversely, Model 2 is able to simulate Temp, CO 2 , and Ice with a higher degree of accuracy because it contains linkages between these three variables and a fuller set of variables for solar insolation.
To conclude, the inability of Model 1 and Model 3 to simulate the glacial cycles implies 15 that these models are missing exogenous variables that drive glacial cycles and/or endogenous variables that translate changes in solar insolation to changes in climate. The improvement of R 2 in Model 4 compared with Model 2 suggests that it is important to allow the endogenous climate mechanisms to be as rich as possible. This conclusion allows us to use various versions of Models 1-4 to test hypotheses about glacial cycles 20 that are raised in the literature.

Strong statistical associations among endogenous variables temperature, CO 2 , and ice are sufficient to explain glacial cycles
As suggested by the R 2 for individual equations in the CVAR and the S 2a statistics, models differ in their ability to simulate glacial cycles. Model 1 and Model 3 are not 25 able to reproduce the large fluctuations in temperature, CO 2 , or ice in a meaningful fashion (Fig. 1a-c). For example, the in-sample simulation generated by Model 1 is able to account for only 21 percent of the observed variation in temperature. 598 This inability may seem surprising given that previous analyses indicate that strong statistical associations exist between temperature, CO 2 , and ice. But, because R 2 measures of model's fit of an endogenous variable given the observed values of the other variables, it need not be a good measure of a model's ability to simulate glacial cycles. To illustrate this, we use Model 1a to simulate temperature and ice endogenously 5 assuming that CO 2 (along with annual solar insolation at 65 • N) is given exogenously, i.e. endogenously simulated values of CO 2 are replaced with observed values for CO 2 , which allows temperature and Ice to adjust towards equilibrium values implied by the observed values of CO 2 . This change in model specification increases the ability of Model 1a to simulate val-10 ues for temperature (Fig. 1a). Measured by the R 2 of the in-sample forecast, Model 1a can account for 64 percent of the variation in temperature. This, however, is still less than the R 2 between observed values of CO 2 and temperature, which is 0.77. And it is less than the value generated by Model 4, in which CO 2 is simulated endogenously. The inability Model 1 to reproduce the climate cycles may be sensitive to the choice 15 of variable to represent solar insolation (here Insol0). For example, other analysts (e.g. Kukla et al., 1981) argue that mid-June insolation at 65 • N is an important driver of glacial cycles. But specifying insolation on the first day of summer, 21 June (Model 1b) instead of total annual insolation at that latitude has little effect on the ability of Model 1b to reproduce glacial cycles (Fig. 1a-c). For example, Model 1b is able to account for 20 31 percent of the variation in temperature. Thus, in spite of strong statistical associations among Temp, CO 2 , and Ice, endogenous in-sample simulations suggest that, without a sufficient set of solar insolation pushing variables, type 1 models fail to generate accurate simulations of glacial cycles. 25 One possible explanation to the failure of Model 1 to simulate glacial cycles is that the three endogenous variables (Temp, CO 2 , and Ice) fail to capture important feed-backs endogenous to the climate system. Indeed, some analysts argue that the large glacial 599 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version Interactive Discussion terminations associated with the 100 kyr cycle are driven by free oscillations endogenous to the climate system. For example, Saltzman and Verbitsky (1994) develop a model that suggests an internal instability, which is generated by feedbacks among temperature, CO 2 and ice volume, generates an internal oscillation with a 100 kyr period.

5
We test the hypothesis that glacial cycles are driven by feedbacks endogenous to the climate system with Model 3, which expands Model 1 by including seven more endogenous variables (CH 4 , Fe, Na, SO 2− 4 , Ca, Level, and SST). The expanded set of endogenous variables does not change the simulation for Temp, CO 2 , or Ice in a meaningful fashion. Model 3 generates a more accurate in-sample 10 simulation than Model 1 for temperature and CO 2 (Table 4), but only because the loss function used to calculate S 2a statistic translates accuracy to a binary variable based on the answer to the following question -is the value generated by one simulation closer (to the observed value) than the value generated by the other simulation. But the ability of Model 1 and Model 3 to reproduce the record for temperature during the 15 sample period, as measured by the R 2 of the in-sample simulation does not increase in a statistically meaningful fashion: Model 1 (0.21) and Model 3 (0.23). This result suggests that feedbacks and/or free oscillations among the ten endogenous variables in Model 3 cannot generate glacial cycles accurately. 20 Consistent with the Milankovitch hypothesis, there is a large increase in the skill of Model 2 relative to Model 1 compared to the small increase in the skill of Model 3 relative to Model 1. The improvement of Model 2 is caused by adding more variables for solar insolation, as opposed to simply adding more variables. While adding variables should not diminish a statistical model's ability to simulate in-sample (additional 25 variables that do not have a statistically measurable effect will be "zeroed out" by the estimation procedure), simply adding more variables does not improve a statistical model's skill as indicated by the performance of Model 3 relative to Model 1. Model 2 expands Model 1 with nine forcing variables that measure variations in solar insolation due to seasonal and orbital changes. With this change, Model 2 simulates a much larger fraction of the increases and decreases in Temp, CO 2 , and Ice (see Fig. 2a-c). For example, it is able to simulate terminations associated with marine isotopic stages 9, 7, and 5.

Which component of solar insolation drives glacial cycles?
The conclusion that glacial cycles are driven by changes in solar insolation begs the question, which variables for solar insolation contribute most to the in-sample accuracy of Model 2 relative to Model 1. We assess this question by estimating models (Models 2a-c) that contain variables for solar insolation from one of three groups: (1) orbital changes (precession, obliquity, eccentricity); (2) seasonal variations (SunSpr, SunSum, SunAut and SunWin) and; (3) variations in cumulative summer insolation (Insol 275 and Insol550). As indicated in Fig. 3a-c, including orbital variations (Model 2a) generates the greatest increase in accuracy relative to Model 1, followed by seasonal variations (Model 2b), and lastly by cumulative summer insolation (Model 2c). The in-15 sample forecast generated by Model 2c is visually similar to that simulated by Model 1 and the accuracy of these two models cannot be distinguished statistically. Consistent with this result, removing Insol275 and Insol550 from Model 2 (Model 2(d)) generates little change relative to Model 2, which indicates that cumulative summer insolation (at these two thresholds) has little explanatory power relative to the other variables for so-20 lar insolation. Together, these results imply that Insol275 and Insol550 make only a small contribution to the ability of Model 2 to simulate glacial cycles.

Obliquity paces late Pleistocene terminations
Continuing with the logic from the previous section, it has been argued that some components of orbital changes are more important to glacial cycles than others. To test Introduction

Tables Figures
Back Close

Printer-friendly Version
Interactive Discussion or eccentricity, Huybers and Wunsch (2005) analyze the leading empirical orthogonal function of ten well-resolved marine δ 18 O records. Results reject the null hypothesis that glacial terminations are independent of obliquity at the five percent significance level. Conversely, this hypothesis cannot be rejected for eccentricity and precession. We evaluate these results by estimating versions of Model 2 that include either: 5 obliquity (Model 2(e)), eccentricity (Model 2(f)), or precession (Model 2(g)). Of the three models, only Model 2(e) simulates changes in temperature, CO 2 , and ice volume large enough to be interpreted as terminations (Fig. 3a-c). But obliquity is not a sufficient explanatory variable for glacial terminations. As indicated in Fig. 3a-c, Model 2(e) simulates several false terminations, which we define as 10 large reductions in ice cover (and increases in temperature and atmospheric CO 2 ) that are not present in the observational record. For example, Model 2(e) simulates a large reduction in ice volume about 90 kyr (Fig. 3c). There is no noticeable reduction in the observational record. Huybers and Wunsch (2005) recognize this difficulty, suggesting that the climate 15 state skips one or two obliquity beats before deglaciating. By way of explanation, they suggest that high latitude insolation and thickness of the ice sheet determine whether a glacial termination event occurs. This explanation seems unlikely. Replacing annual solar insolation at 65 • N with corresponding values at latitudes between 60 • and 85 • from the Northern or Southern Hemisphere does not allow a modified version of 20 Model 2(e) to improve its ability to differentiate obliquity cycles that do and do not generate deglaciations. The thickness of ice may determine whether an obliquity cycle generates a deglaciation event, but Model 2(e) is not able to simulate ice volume with sufficient accuracy to do so. Nonetheless, the two components of the explanation, the thickness of the ice sheet 25 and high latitude insolation can be considered as specific examples of two general categories that cause the missing beats; internal dynamics of the climate system or other aspects of solar insolation. To test the role of endogenous climate dynamics, we estimate Model 3(a), which uses obliquity and Insol0 as the two variables for solar 602 insolation that push changes in the ten endogenous variables. As indicated in Fig. 3ac, changes in ice volume, CO 2 , and temperature are greatly reduced relative to the changes simulated by Model 2(e). But the poor skill of Model 3(a) indicates that obliquity does not pace the endogenous variable(s) that determines whether the relationship between obliquity and ice volume (and or temperature and CO 2 ) generates a glacial 5 termination. Conversely, Model 2(a), which contains precession and eccentricity in addition to obliquity, generally is able to translate obliquity cycles into glacial terminations. This capability suggest that eccentricity and precession influence, either directly or indirectly via endogenous climate variables, whether changes in obliquity generate a glacial ter-10 mination. This result contradicts the findings by Huybers and Wunsch (2005) that glacial terminations are independent of eccentricity and precession.

The 100 kyr cycle is generated by nonlinearities, threshold effects, and/or endogenous oscillations
Model 4 generates the most accurate in-sample simulation because it combines the 15 expanded set of variables for solar insolation with the expanded set of endogenous variables (Fig. 2a-c). Together, these variables are able to simulate a large portion of the changes in Temp, CO 2 , and Ice that are associated with the 100 kyr oscillation. Model 4 uses linear specifications to represent the long-run relationship between exogenous and endogenous variables, and all variables are linear representations of 20 physical variables other than Insol275 and Insol550. But Insol275 and Insol550 have little explanatory power (Sect. 4.4). The only other non-linear component is the asymptotic manner in which endogenous variables adjust towards their long-term equilibrium. This ability does not imply that glacial cycles are completely linear. The most systematic "inaccuracy" in Model 2 and Model 4 is their inability to simulate the very sharp, 25 short rise in temperature at the start of the terminations associated with MIS 5, 7, and 9. This phase of a glacial cycle may be generated by non-linearities, threshold effects, or the failure to include an important exogenous or endogenous variable. A second 603 Introduction

Tables Figures
Back Close

Printer-friendly Version
Interactive Discussion systematic error, the inability to simulate the most recent interglacial, is described below.
Similarly, the ability of Model 4 to reproduce a large fraction of the changes in Temp, CO 2 , and Ice associated with glacial cycles does not imply that the CVAR model "solves" the 100 kyr problem in a way that is consistent with physical mechanisms.

5
Variables that represent variations in Earth's orbit are measured in dimensionless units ( Table 1). As such, it is not possible to make a physical interpretation the coefficients associated with eccentricity, precession, and obliquity.
The lack of a physical interpretation may be most troublesome for eccentricity, which is associated with very small changes in solar insolation. To evaluate the importance 10 of the eccentricity variable, we estimate Model 4a, which eliminates eccentricity from Model 4. Even without the eccentricity variable, Model 4a is able to reproduce much of the 100 kyr Cycle (Fig. 2a-c). The in-sample simulation captures much of the warming and reduction in ice volume that is associated with the second phase of glacial termination T III and glacial termination T II . Conversely, Model 4a is less able to reproduce 15 glacial termination T IV and the first phase of termination T III . Together, these results suggest that the ability of Model 4 to reproduce a significant portion of the 100 kyr cycle is not due solely to small changes in solar insolation as represented by the eccentricity variable.
Although Model 4 is able to capture much of the variation in glacial cycles, there are 20 several extended periods when simulated values for Temp, CO 2 , and/or Ice deviate from observed values. Most notable are periods of warming during cool periods associated with MIS 6 and 10. Another difficulty includes the inability to simulate much of the warming that is associated with the most recent glacial termination T I . This failure may be caused by the omission of anthropogenic activities that raise the atmospheric 25 concentrations of CO 2 and CH 4 . According to Ruddiman (2003), concentrations of CO 2 and CH 4 rise anomalously starting 8,000 and 3500 years ago and these rises affect the climate system. As described in Sect. 5, future research will use the statistical model estimated here to assess the "Ruddiman hypothesis". 604

Solar insolation in the Northern Hemisphere drives glacial cycles
The mixed results reported in Table 5 indicate that glacial cycles are driven by changes in solar insolation in both hemipsheres. Models that specify variables which measure solar insolation in the Southern Hemisphere generate the most accurate simulations for Temp, CO 2 , Ice, and Na. Of these, the first three are used most often to represent glacial cycles. From a statistical perspective, strong statistical associations between Southern Hemisphere insolation and temperature and Na are not surprising. Our measure of temperature represents conditions near the Vostok core, where the ice is formed. Similarly, Na represents sea ice in the high latitudes of Southern oceans. As such, local 10 insolation is expected to be an important determinant. Given this linkage, it may seem surprising that Southern Hemisphere insolation does not generate the most accurate simulation for SST, which measures temperature in the high latitudes of the Southern Atlantic.
The correlation between insolation at high latitudes of the Southern Hemisphere and 15 CO 2 can be explained in terms of recent hypotheses about the role of ocean circulation. According to these hypotheses, ice cover (Stephens and Keeling, 2000) and ocean overturning (Francois et al., 1997;Toggweiler, 1997;Watson and Garabato, 2006) in the high latitudes of the Southern Ocean drive atmopspheric CO 2 concentrations that control glacial cycles. That solar insolation in the Southern Hemisphere generates the 20 most accurate simulations for temperature and ice also is consistent with findings by Petit et al. (1999)  importance of solar insolation in the Northern Hemisphere based on empirical relationships between insolation in the Northern Hemisphere and ice volume (Petit et al., 1999). This relationship is explained by reference to the formation of North Atlantic deep water (Imbrie et al., 1993;Gildor and Tziperman, 2001). Insolation in the Northern Hemisphere generates the most accurate in-sample sim-5 ulations for CH 4 , Fe, Ca, and Level. Atmospheric methane is generated largely by terrestrial biotic activity at mid latitudes -most land is located in the Northern Hemisphere, especially at mid and high latitudes. The distribution of land between hemispheres also implies a correlation between Northern Hemisphere insolation and Fe, which has a terrestrial origin. Conversely, the ability of Northern Hemisphere insolation to generate 10 the most accurate in-sample simulation for Ca is surprising because Ca is thought to be associated with conditions in the high latitudes of the Southern Hemisphere (Basile et al., 1997;Wolff et al., 2006). Finally, there is no simple explanation for why Northern Hemisphere insolation should generate the most accurate in-sample simulation for sea level, especially because insolation in the Southern Hemisphere generates the most 15 accurate simulation for ice volume and ice volume is a major control on sea level.

Conclusions
Although the statistical models for glacial cycles reported here cannot replace mechanistic models, a CVAR can be used to test competing hypotheses against the observational record. Here, we demonstrate that a CVAR can simulate much of the variation 20 in Temp, CO 2 , and Ice that is associated with glacial cycles. To do so, the CVAR must represent solar insolation with a suite of variables that represent changes in Earth's orbit and seasonal variations at various latitudes. Expanding the set of endogenous variables beyond temperature, CO 2 , and ice also enhances the ability of the CVAR to simulate glacial cycles. The hemisphere and latitude that generates the most accurate 25 simulation varies among the ten endogenous variables in Model 4.

606
The ability of the CVAR to test hypotheses against the observational record extends well beyond previous efforts. Statistical relationships among time series in the CVAR model are not spurious. The presence of cointegrating relationships indicates that time series share the same stochastic trend, which implies that time series have a long-run relationship. These long-run relationships and the dynamics by which variables adjust 5 to disequilibrium in are embodied in the Π matrix of Eq. (1).
In this first step of an on-going research effort organized around a CVAR model, we do not disentangle long-and short-run relationships. Future research will decompose the Π matrix by imposing over-identifying restrictions. Identifying the CVAR will allow us to explore the following topics: 10 -Quantify the long-term temperature sensitivity to a doubling of atmospheric CO 2 (∆T 2x ). The identified CVAR also will allow us to separate the direct effect from the full ice-albedo feedback.
-Quantify the dynamics by which changes in solar insolation affect endogenous variables and the mechanisms by which these effects are transmitted through the 15 climate system. This will build on current efforts that seek to match turning points in time series (Caillon et al., 2003).
-Test hypotheses about the mechanisms that drive changes in the endogenous variables. For example, testing whether Fe, SST, and/or Na are part of the cointegrating relationship for CO 2 will allow us to test whether iron fertilization, ocean 20 overturning, and/or ice cover, respectively, play an important role in controlling atmospheric CO 2 during glacial cycles (Sigman and Boyle, 2000).
-Test the hypothesis that the nature of glacial cycles changes over time. For most of the endogenous variables, data are available over the last 750 kyr. We will estimate the CVAR over this full period and test whether the long-term relationships 25 and/or rates of adjustment change in a statistically meaningful way, with special focus on the so-called mid Pleistocene Revolution. We will also use the model Introduction to "backcast" the endogenous variables and compare the simulated values for ice volume over the last several million years, for which data are available.
-Test the "Ruddiman hypothesis" that anthropogenic climate change starts about 5000 years ago. We will include times series for CO 2 and CH 4 emissions associated with anthropogenic activities as exogenous variables and test whether 5 anthropogenic emissions improve the model's ability to simulate the last glacial termination, which it currently does poorly.