Operational Seasonal Water Supply and Water Level Forecasting for the Laurentian Great Lakes

Seasonal water supply forecasts are a critical component of regional water resources management planning. Across the United States, multiple modeling tools and operational protocols have evolved over time to address this need. Here, the authors document, assess, and recommend improvements to the current operational water supply forecasting protocols employed in managing flows through, and water levels across, Earth’s largest lake system: the Laurentian Great Lakes. The water resources management actions for this massive system are critically linked to North America’s economy, and to human health and safety, through planning and policy decisions related to hydropower management, commercial navigation, and water level fluctuations. The authors’ assessment indicates that existing operational seasonal water supply forecasting systems for the Great Lakes have moderate skill and that there are several high-priority areas for improvement, including improved representation of initial hydrological conditions, aligning variability in future meteorological forcings with climate-scale projections, and robust representations of forecast uncertainty. DOI: 10.1061/(ASCE)WR.1943-5452.0001214. This work is made available under the terms of the Creative Commons Attribution 4.0 International license, https://creativecommons.org/licenses/by/4.0/. Author keywords: Seasonal water supply forecasting; Operational water resources management; Binational management; Great Lakes.


Introduction
Seasonal water supply forecasts are a critical component of robust regional water resources management planning (Jackson et al. 2001;Chiew et al. 2003). Throughout North America, multiple forecasting systems, as well as operational protocols for employing those systems, have evolved over time. In the Mountain West, for example, there is a long history of forecasting seasonal water supplies through empirical models based on spring snow-water equivalent (Wood and Lettenmaier 2006;Pagano et al. 2009). Elsewhere, more complex physically based models are employed (Anghileri et al. 2016), and efforts aimed at improving the use of ensemble climate forecasts, such as the North American Multi-Model Ensemble (NMME), are ongoing (Bolinger et al. 2017;Lavers et al. 2009;Shukla et al. 2016). The range of water supply forecasting protocols and systems is further amplified by the diversity in forecasting environments, ranging from those that are research-oriented to those that are fully operational. These forecasting environments, understandably, are designed to meet widely varying regulatory, ecological, and economic criteria.

Laurentian Great Lakes System
Water is abundant in the Laurentian Great Lakes (Fig. 1), which contain roughly 20% of the Earth's unfrozen fresh surface water (by volume) and are collectively the largest surface area of fresh water (Lake Superior alone is the largest lake on Earth by surface area, Gronewold et al. 2013). Although water is abundant in the region and the range of variability (Fig. 2) in water levels is small relative to the maximum depth of the lakes (406 m in the case of Lake Superior), both long-and short-term water level variability combine to have dramatic impacts on wetland migration and nearshore ecology (Duhamel et al. 2017), commercial navigation (Millerd 2011), municipal water supplies (International Upper Great Lakes Study 2012), and public and private property (Norton et al. 2011). The long-and short-term variability that characterizes the Great Lakes historical record of water levels (Fig. 2) offers a contrast to other regions where hydrologic conditions are characterized by monotonic sea level rise, flash floods, or chronic drought (Ekman 1999;Cooper et al. 2008).
Similar to other transboundary basins, where challenges related to international boundaries have resulted in unique approaches to water managemet (e.g., Biancamaria et al. 2011), the binational management of the waters of the Great Lakes and St. Lawrence River basin, which is home to eight US states and two Canadian provinces, has resulted in the evolution of unique forecast protocols. Flows in the connecting channels of the Great Lakes are regulated at three locations: the St. Marys River at the St. Marys Rapids in Sault Ste. Marie, the Niagara River (where regulation allocates flow among the Niagara Falls and US and Canadian hydropower, but the total flow is not regulated), and the St. Lawrence River at the Moses Saunders Dam in Massena, New York and Cornwall, Ontario. Management of the Great Lakes is the responsibility of the International Joint Commission (IJC), which is guided by the Boundary Waters Treaty of 1909. The USACE and Environment and Climate Change Canada (ECCC) provide technical support to the IJC through the International Lake Superior Board of Control, the International Niagara River Board of Control, and the International Lake Ontario-St. Lawrence River Board.
Interestingly, no single authority is similarly charged with providing these agencies with seasonal water supply forecasts. The National Oceanic and Atmospheric Administration (NOAA) National Weather Service River Forecasting Centers, which historically provide these forecasts in other riverine systems in the United States, have jurisdictional boundaries that do not align with those of the Great Lakes watersheds, nor do they extend across Canadian land surfaces. As such, seasonal basinwide hydrological forecasting for the Great Lakes, as well as the propagation of those supplies into lake water levels and flows through the channels that connect the lakes, falls under the operational protocols of the USACE and ECCC.
Seasonal water supply forecasts in the Great Lakes basin must address several challenges, some of which are unqiue to the basin, but some of which share commonalities with many other water management applications. The transboundary nature of the basin and its management require coordination of data and forecasts across juristictional boundaries. The translation of water supply to lake levels requires simultaneous simulation of flows through all of the connecting channels, which are dependent on lake-to-lake water level differences, channel characteristics (including ice and weed retardation), and regulation of the St. Marys and St. Lawrence River control structures. In addition to these challenges related to transboundary water management, the large surface areas of the lakes relative to the basin areas results in the need to accurately represent the combined influence of both overlake and overland processes. Finally, water supply forecasts in the Great Lakes face many of the challenges that are common across water management applications and regions, such as resolving the influence of operator judgment in forecast skill, adequately representing initial conditions (Haibin et al. 2009;Shukla and Lettenmaier 2011), incorporating the use of climatic forecasts (Shukla and Lettenmaier 2011;Shukla et al. 2016), and representing uncertainty (Schaake et al. 2007;Roscoe et al. 2012).

Scientific Objectives
For the Laurentian Great Lakes, basin-scale seasonal water supply forecasts are an intermediate stepping stone toward forecasts of lakewide average water levels and flows in the connecting channels (i.e., St. Marys, St. Clair, Detroit, Niagara, and St. Lawrence Rivers, all shown in Fig. 1). The purpose of this study is to document existing seasonal operational water supply and water level forecasting protocols, introduce rigorous verification methods into forecast procedures, and benchmark the performance of water supply models used in water level forecasts. The Great Lakes serve as an ideal test bed for improving operational water supply forecasting because they face challenges that are common both to the land surface hydrologic science community and to lake and reservoir management agencies. Great Lakes hydrological forecasting also requires reconciliation of discrepancies in international water balance monitoring and forecasting protocols between the United States and Canada (Gronewold and Fortin 2012;Mason et al. 2019). Finally, there is currently a high demand for improving seasonal water supply forecasting in the Great Lakes, particularly in light of recent extreme water level fluctuations (Lenters 2001;Quinn 2002;Assel et al. 2004;Gronewold and Stow 2014a;Gronewold et al. 2015;Carter and Steinschneider 2018) and their impacts on coastal flooding, wetland migration, commercial navigation, and hydropower management (Wilcox et al. 2002;Millerd 2011).

Current Great Lakes Seasonal Hydrological Forecasting Protocols
USACE and ECCC produce seasonal water level forecasts in their role supporting the regulation of Lake Superior and Lake Ontario outflows. Seasonal water level forecasts generated by USACE and ECCC are coordinated and communicated to the public through the Monthly Bulletin of Lake Levels for the Great Lakes (published by USACE) and the Monthly Water Level Bulletin (published by ECCC). While USACE conducts regular internal assessments of forecasts used in operational procedures, formal documentation of the skill of the water supply forecast models used by USACE is limited to studies by Noorbakhsh and Wilshaw (1990), Croley and Lee (1993), and Gronewold et al. (2011).
Water level forecasts are driven by forecasts of net basin supply (NBS), which represent an estimate of the supply of water to a lake, excluding flows through connecting channels and diversions. Under the assumption that contributions from groundwater are negligible, NBS is equal to the precipitation over the lake, P, plus the lateral tributary runoff to the lake, R, minus the evaporation from the lake, E [Eq. (1)]. Because of the large surface areas of the lakes relative to their watershed areas, the overlake components (precipitation and evaporation) represent a much larger portion of the lakes' water budgets than for most other lakes and managed reservoirs: NBS is then translated to changes in water levels through the use of the Coordinated Great Lakes Regulation and Routing Model (CGLRRM), which was developed by the Coordinating Committee on Great Lakes Basic Hydraulic and Hydrologic Data, composed of members of US and Canadian federal agencies [for details on the Coordinating Committee, see Gronewold et al. (2018)]. The CGLRRM incorporates regulation rules and relationships required to estimate flow through the connecting channels. While there is one coordinated routing model (CGLRRM), the method used to forecast NBS is determined by USACE-Detroit and ECCC independently. Accordingly, in this study, the focus is on the water supply (i.e., NBS) forecasts that are behind the USACE contribution to the coordinated 6-month forecast.
Seasonal hydrological forecasting models tend to fall into one of several discrete categories, including those that are process-oriented, empirical, or physically based (Weber et al. 2012;Rheinheimer et al. 2016). NBS models used in the USACE-Detroit water level forecasts fit into the process-oriented and empirical categories. These models include the process-oriented Great Lakes Advanced Hydrological Prediction System (GL-AHPS), a trend model, and a regression  1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 Lake Superior Lake Michigan−Huron Lake Erie Lake Ontario model. These models are briefly summarized in Table 1. Each model is described in more detail in the methods section.
The forecast NBS values that result from each of these models run by USACE are not published in an operational forecast. However, the NBS projections are used in the operational water level forecast that constitutes the US contribution to the internationally Coordinated Great Lakes Water Level Forecast. The protocol for using the NBS forecasts as input to water level forecasts is described here.
Prior to running the NBS forecast models, the operator conducts a qualitative assessment of current and recent hydrological and meteorological conditions. This assessment takes into consideration past months' temperatures, precipitation amounts, and NBS values, as well as snowpack, ice cover, streamflows, and soil moisture conditions. Additionally, the operator considers the expected climatic conditions over the next 6 months, drawing from the NOAA Climate Prediction Center's (CPC) seasonal climate outlook maps and associated prognostic discussion (NOAA CPC 2005), near-term forecasts (e.g., NOAA Weather Prediction Center's Quantitative Precipitation forecasts, NOAA WPC 2019) and precipitation and temperature forecasts aggregated to the lake basin scale made available by the Great Lakes Seasonal Climate Forecast Tool (Bolinger et al. 2017). This assessment is documented in an internally archived climate outlook summary discussion and provides the operator with a basis for evaluating NBS forecasts.
After the NBS forecasts are run, the operator evaluates each NBS sequence generated, considering the expectations described in the aforementioned climate outlook summary. The operator selects the "best" model based on an understanding of how existing basin conditions and forecast meteorology may translate to NBS. This is referred to as the operator-selected value (OSV) in the remainder of this paper. In addition to the three models described here, the operator considers a persistence model, in which the forecast 1-month NBS has the same probability of exceedance as the previous month's observed NBS. This model is also considered in the analysis here and will subsequently be referred to as the Persistence model. In the case where none of the four models reflect expected NBS trends, given the existing and forecast basin conditions, the operator may choose to reject model output in favor of an OSV. Finally, conventional practice is to favor forecasts that do not stray far from average conditions, unless there is unusually strong indication of wet or dry conditions in the climatic forecast.
Uncertainty bounds are then placed around the deterministic operator-selected projection using historical sequences of NBS. In practice, because the components of NBS (precipitation over the lake, runoff to the lake, and evaporation from the lake) are derived from sparse monitoring networks (Gronewold and Stow 2014b), the residual net basin supply, NBS R , is used to represent historical NBS. NBS R is equal to the change in storage minus the inflow to the lake through connecting channels and diversions plus the outflow through connecting channels and diversions. The relationship between NBS R and the NBS derived from the components (P þ R − E, or component NBS, NBS C ) is derived from the water balance equation [Eq. (2)]: where Δ S = change in storage (observable in change in water level); Q in and Q out = inflows and outflows through connecting channels, respectively; and ε = error term reflecting uncertainty in observations and incomplete representation of water balance (resulting from neglecting the contribution of groundwater, for example). Eq.
(2) is rearranged, leaving NBS C on one side and the NBS R on the other side, as in Eq. (4): or The upper and lower bounds of the NBS forecast are determined by finding the 5th and 95th percentile values of cumulative NBS R sequences for a given forecast horizon (i.e., the 1-month forecast bounds determined using the historical NBS values for that calendar month, the 2-month forecast bounds are determined using the cumulative 2-month NBS, then distributed to monthly). These socalled operational 5% and 95% supplies are updated periodically. During the analysis period of 2007-2018, the operational supplies were updated twice: once in 2010 and once in 2014.
The final stage in developing a water level forecast is to route the operator-selected NBS sequences and the 5th and 95th percentile NBS sequences through the CGLRRM to produce monthly mean water levels. These monthly mean water levels are then coordinated with counterparts in ECCC, with the end result becoming the internationally coordinated Great Lakes water levels forecast.

Methods
In a typical model skill assessment, simulations are conducted using observed meteorology as input, whether it is during a verification period or through the use of cross-validation procedures.
Although this can provide insight into the benefits of different models, results from these approaches do not fully represent the true skill of a forecast methodology. In reality, it is reasonable to assume that the forecast skill is highly dependent on model inputs, many of which are forecast themselves. The importance of archiving forecasts to provide a means for monitoring model performance under actual forecasting conditions has been noted by Gronewold et al. (2011) in the context of improving Great Lakes water level forecasting. Since 2002, USACE has been archiving digital outputs used to inform water level forecasts published in its Monthly Bulletin of Great Lakes Water Levels. Although archiving was initially irregular and the methodology has evolved to incorporate the current suite of models developed since then, the repository of consistently archived forecasts is now large enough (2007 to present) to conduct a rigorous assessment of the actual forecast skill of each of these models. The following subsections provide descriptions of the individual NBS models evaluated in this study, introduce the data sets used for benchmarking model performance, and present the approaches used to assess water supply and water level forecast performance.

Process-Oriented Models
It is informative to note that few watershed and process-oriented models have been developed for seasonal prediction over the entire Great Lakes domain. To the authors' knowledge, the only published process-oriented model that has been developed and applied for seasonal forecasting across the Canadian and US land surfaces of the basin is the GL-AHPS) [for details see Gronewold et al. (2011)], which employs a modified version of the conventional extended streamflow prediction (ESP) procedure (Day 1985) with weighted outlooks (Croley and Hartmann 1987). Other projects that have in fact focused on improving hydrological forecasting across the Great Lakes basin, including development of the Modélisation Environnementale-Surface et Hydrologie (MESH) system (Pietroniro et al. 2007) and customization of the Regional Climate Model system (RegCM) (Notaro et al. 2015), have done so for either much shorter or longer forecasting horizons. The GL-AHPS is a Microsoft Windows (Redmond, Washington) based NBS and water level forecasting application developed by NOAA's Great Lakes Environmental Research Laboratory (GLERL) [for details see Croley (2000) and Gronewold et al. (2011)]. GL-AHPS passes overlake and overbasin precipitation and temperature estimates to a lake thermodynamic model (LTM), described by Croley (1989), and a Large Basin Runoff Model (LBRM) [for details see Croley and He (2002)] to estimate the overlake evaporation and runoff to the lakes, respectively, and then ultimately calculates NBS for each lake. GL-AHPS forecasts are run using a probabilistic forecasting framework in which the LTM and LBRM are driven by an ensemble of meteorology time series created by weighting each year of the historical record of meteorology. In the USACE operational use of GL-AHPS, the weights are assigned by taking into consideration the NOAA CPC seasonal climatic outlooks using procedures described in Croley (2000). The forecast is initialized by first running the models using a provisional record of hydrometeorological variables, created by extracting data from NOAA's National Climatic Data Center (NCDC) and calculating subbasin and overlake averages using a Thiessen weighting algorithm (Croley and Hartmann 1985).

Statistical Water Supply Models
The Trend model for forecasting NBS, described in detail by Noorbakhsh and Wilshaw (1990), was developed using time series analysis of NBS broken down into three components: trend, seasonal variation, and cycle. The trend component consists of an analysis of primary and secondary trends in NBS. Prior to computing the primary and secondary trends, NBS is converted to so-called coded NBS, NBS coded , to eliminate fractions and negative values. The primary trend, NBS primary , is determined by a least-squares fit of annual coded NBS to a line, as in Eq. (5): In Eq. (5), X represents the time from the beginning of the period of record of annual NBS. In current practice, a period of record of 35 years is used in the determination of Intercept and Slope used for the primary trend in Eq. (5). The secondary trend, NBS secondary , is determined using seasonally coded NBS for the past 6 and 18 months (seasonally coded NBS ¼ NBS coded divided by a seasonality index to remove seasonal effects). These seasonal indices represent the average fraction of the annual NBS that is contributed by each month of the year. This secondary trend is computed as shown in Eq. (6): where m = month index from current month, 0; and SI m = seasonal adjustment for month m. The total trend component for the forecasted month is then computed using Eq. (7): Finally, a cycle component is applied based on the prior 12 months' precipitation relative to average [Eq. (8)]: where CF = cycle correction factor; A = sum of previous 12 months' precipitation; and B = long-term average annual precipitation.
The Regression model, also described by Noorbakhsh and Wilshaw (1990), is the result of a multiple linear regression to describe the relationship between the coming month's NBS, NBS m , and that month's precipitation and temperature and antecedent conditions, including the prior 6 months' precipitation, the prior 3 months' NBS, and the previous month's temperature [Eq. (10)]: where m = month for which NBS is predicted; P i , T j , and NBS k = precipitation for month i, average temperature for month j, and NBS for month k, respectively; and a i , b j , and c k = calibration coefficients for each calendar month. The operational version of the Regression model predicts only the coming month's NBS (i.e., m ¼ 0). Variables without significant correlation were not included in the regression equation. The Regression model is recalibrated periodically to incorporate new observations. The most recent calibration was conducted in 2004 and incorporated data from 1918 to 1999. Data used to calibrate the model included NBS R [Eq. (4)], monthly precipitation determined using a Thiessen weighting method for estimating basinwide precipitation from a land-based network of gauges, and average basinwide temperature also determined using a Thiessen weighting method. Operationally, the coming month's forecast temperature and precipitation are estimated using basin-averaged forecasts derived from the NMME (Bolinger et al. 2017).

Data Used for Benchmarking Model Performance
To assess the skill of the NBS forecast models, NBS forecasts, including the OSV, are extracted from the USACE digital archives and evaluated by comparing the forecast NBS with the historical record of residual NBS, NBS R . NBS R is used for benchmarking model skill rather than NBS C because of the highly uncertain nature of runoff and overlake precipitation and evaporation. Additionally, the water levels and flows used to compute NBS R values are regularly coordinated between the Canadian and US governments, and as a result NBS R is considered to be the best estimate of actual NBS despite some uncertainty in lake level and connecting channel flow measurements. Residual NBS values have been coordinated by USACE and ECCC for Lakes Superior, Michigan-Huron, St. Clair, and Erie through 2008. These values, along with subsequent provisional residual NBS values and, for Lake Ontario, provisional NBS values for the entire period, form the so-called observational data set used in this study.
Lakewide average water levels are computed using a network of gauges operated by NOAA and the Canadian Hydrographic Service. The daily water levels for each gauge are averaged, and then monthly mean water levels are computed. When a gauge is not reporting, the daily lakewide averages are computed using a gauge pairing logic developed and agreed to by the Coordinating Committee.

Assessment Approach
Accurate prediction of the magnitude, frequency, and timing of high and low flows is the goal of most hydrological models, and most skill metrics used in hydrological modeling studies reflect this goal. In the case of seasonal water budget prediction, however, the traditional metrics may be less informative, particularly because accumulation of bias over a season can be quite profound, despite skill in representing the features of a hydrograph. While average percentage bias is one indication of skill that is informative, in reality, the cumulative NBS over a forecast horizon is the only way to answer the question of whether there will be more or less water after all that happens over 6 months (or, in the case of water level forecasting, whether water levels will be higher or lower in 6 months' time). Accordingly, forecasts' skill at representing cumulative NBS is assessed.
Two approaches were used to assess the skill of the individual deterministic NBS models. In the first approach, the skill of each model is assessed directly against observations of NBS R by graphically comparing the cumulative NBS predicted by each model with the observed cumulative NBS. This direct graphical comparison is conducted for all archived forecasts and for archived forecasts aggregated by forecast start month. In the second approach, the so-called categorical skill of each model is determined using contingency tables. The first approach is meant to provide a baseline skill assessment with which to compare future model and operational protocol developments. The categorical approach was included in this analysis to improve forecasters' interpretation of model output and guide the model selection process.
The current operational protocol used by USACE involves selecting NBS model output that is reasonable given current basin conditions and climate forecasts. In practice, due to the complexities of hydrological conditions at this spatial and temporal scale as well as uncertainty in climate guidance, an operator's expectations are rarely more nuanced than expecting below or above normal NBS for a given month. Operator selection of NBS values typically reflects how well the model's forecast supply for a given month reflects these categorical expectations. Accordingly, skill metrics based on contingency tables, such as the Heidke Skill Score (HSS) (Heidke 1926), are useful metrics that can aid in selecting appropriate NBS forecasts. The HSS has traditionally been used in meteorological studies since its inception in 1926 (e.g., Livezey and Timofeyeva 2008;O'Lenic et al. 2008;McEvoy et al. 2016). This score provides an indication of how well a model forecasts certain outcomes, relative to a random prediction. The HSS metric is computed using a 2 × 2 contingency table (as in Table 2) of above and below median NBS for cumulative NBS predictions. Numbers along the main diagonal ðX 11 ; X 22 Þ represent correct forecasts of above or below median NBS. Values in the other two cells represent missed forecasts ðX 12 ; X 21 Þ. Using the HSS allows evaluation of the accuracy of predicting NBS as falling into the two categories: below or above average NBS. Although skill scores based on contingency tables are useful for assessing the ability to forecast above or below median NBS categorically, they do not fully represent the magnitude of forecast performance.
The HSS is calculated from a contingency table using Eq. (11): where NC = number correct; E = number correct that would be expected by a random forecast; and T = total number of forecasts.
In the contingency table shown in Table 2, NC, E, and T are calculated as In Eqs. (12)-(14) and Table 2, i is used for categories, m represents the number of categories, and p is used for totals in rows or columns. NC represents the sum of the number of correct forecasts, which is the sum of the values that fall along the main diagonal in the contingency table (i.e., X 11 and X 22 ). X pp can be found by adding the totals along the bottom row (total forecasted) or the last column (total observed). X ip represents the total number observed in row i, while X pi represents the total number forecasted in column i. Barnston (1992) has noted that when the categories are equally probable (as they are in this study), the HSS is an "equitable" skill score, because the probability of a correct random forecast does not depend on the category of the observation. The HSS can range from −1.0 to 1.0, where positive values indicate a forecast that is better than a random forecast, and negative values indicate a forecast that is worse. A perfect forecast would have an HSS score of 1.0.
In addition to evaluating the skill of the individual NBS models, the validity of using the operational 5% and 95% supplies derived from historical residual NBS to express forecast uncertainty is evaluated by comparing the range of operational supplies used during the analysis period with observed supplies.
Finally, to demonstrate the eventual impact of NBS forecast skill on water level forecasts, water level hindcasts are compared with actual monthly mean observed lakewide average water levels. Prior to 2017, the operational protocol was such that the routing and regulation model used to translate NBS forecasts to water levels was only run for the operator-selected NBS values. In 2017, this procedure was changed so that water level forecasts from all NBS models could be compared during the operator selection process. Accordingly, water level forecasts from individual models prior to 2017 were unavailable. For this period, water levels were reforecast by running the archived NBS sequences from each model through the CGLRRM using actual starting levels. Subsequent to December 2016, actual archived water level forecasts for each model were used in this analysis.

Results
The analysis period includes several noteworthy years, among them 2010, which was characterized by very low NBS, and 2013, 2014, and 2017, which were characterized by very high NBS. In fact, the record-setting 2-year rise on Lake Superior and Michigan-Huron from 2013 to 2014 followed a decade-long period of low water levels that included a record low Lake Superior water level in 2007 and culminated in a record low Lake Michigan-Huron water level in January of 2013. The inclusion of years with very low NBS Lake Superior Cumulative Error in NBS Forecast (mm) Fig. 3. Error in cumulative NBS predictions from each model (rows) for Lake Superior. Regression and Persistence models are applied for 1-month forecasts only and so are shown as points.
and years with very high NBS makes this analysis period particularly interesting for assessing seasonal hydrological forecasts.

Cumulative NBS Forecasts from Individual Models
The error in cumulative NBS predictions from each model for Lake Superior is shown in Fig. 3. Similar plots for Lakes Michigan-Huron, Erie, and Ontario are provided in the Supplemental Material Figs. S1-S3. In Fig. 3, a shift in forecast bias from positive to negative occurs in 2013 for the GL-AHPS, Trend, and OSVs and can be explained by the dramatic change from a long period of low NBS ending in 2012 to a period of generally high NBS. Similarly, cumulative NBS predictions for the other lakes shown in the Supplemental Material show underprediction during transitions to wet conditions (e.g., Lake Michigan-Huron forecasts made in early 2013 when Lake Superior and Michigan-Huron began a recordsetting 2-year rise and Lake Ontario forecasts made in 2017 when record-high water levels were set in the spring and early summer) and overprediction during transitions to drier conditions (e.g., forecasts made for Lake Erie in early 2012 after record-high annual NBS in 2011 on that lake). Average errors in cumulative NBS predictions for each calendar month (e.g., average error in monthly NBS of all forecasts produced in Januaries) are shown in Fig. 4. When interpreting this graphic, note that the seasonality of error appears more dramatic for Lake Michigan-Huron and Lake Ontario. For Lake Michigan-Huron this is partly due to the larger fluctuations in NBS than seen on other lakes. For Lake Ontario, the GL-AHPS forecasts greatly overestimate NBS during fall months and underestimate NBS in the spring. This is at least in part due to overestimation of runoff during the fall and underestimation of runoff during the spring. The overprediction in the fall and underprediction in the spring is also generally true for GL-AHPS forecasts for Lake Superior, although less pronounced than for Lake Michigan-Huron and Lake Ontario. During the spring, snowmelt is an important factor driving the runoff component of NBS, suggesting that GL-AHPS forecasts could be improved by better representation of snow accumulation and melting. In contrast, GL-AHPS generally underpredicts cumulative NBS for all forecast start months on Lake Erie.
In Fig. 4, the Trend model appears to have a generally small average error on Lake Erie. Seasonality in bias is larger for Lake Michigan-Huron but is likely a reflection of the larger seasonal variability in observed NBS on that lake. The Trend model appears to underestimate cumulative NBS on Lake Superior on average. The underestimation by the Trend model for Lake Superior may in part be a result of the primary trend component of the model [described by Eq. (5)], which depends on 35 years of annual NBS prior to the forecast start date. The unprecedented decade-long period of low NBS on Lake Superior during the 2000s likely disproportionately influences this primary trend.
Unlike GL-AHPS and Trend forecasts, there does not appear to be any real seasonality in bias in the 1-month forecasts from the Regression or Persistence models. The OSV forecast reflects the seasonal tendencies of the GL-AHPS and Trend models, as these models factor into the operator selection process. However, the seasonality is somewhat dampened in the operator-selected forecast, perhaps a result of selecting a combination of multiple models or forecaster hesitation to diverge far from long-term median values. Fig. 4 also provides an opportunity to determine whether the operator selection process improves forecasts. In general, the operator selected value outperforms GL-AHPS and Trend for Lakes Superior and Michigan-Huron. For Lakes Erie and Ontario, however, the Trend model has similar or better performance than the OSVs. Fig. 5 summarizes the HSSs of the individual NBS models. In this application, using a 2 × 2 contingency table, the HSS represents the skill in terms of predicting whether the cumulative NBS will be above or below the climatological median NBS. In Fig. 5, the HSSs indicate better performance from the Trend model than the operator-selected forecast for Lake Superior, despite the fact that the magnitude of error from the operator-selected forecast is, on average, less than that of the Trend model (Fig. 4). Likewise, although the magnitude of error from the operator-selected forecast is smaller, on average, than that of the GL-AHPS model on Lake Ontario, the HSSs suggest that GL-AHPS outperforms all other models when predicting whether cumulative Lake Ontario NBS will be above or below average. Contingency table analyses, including HSSs, are a conventional practice in meteorological forecast assessment but appear to mask deficiencies in the models' ability to forecast extreme events. In many hydrological forecasting applications (e.g., flood and drought forecasting), the ability to correctly predict the magnitude of flows is most important. If the primary goal is to tell a story of whether water levels will be higher or lower over a given horizon, however, the results from the categorical assessment are useful.

Representation of Uncertainty
As described in the introduction, the forecast range is produced by routing operational 5% and 95% supplies to predict a range of water levels. These operational supplies for Lake Superior are shown as gray bars in Fig. 6. Similar plots for the other lakes are included . HSS scores for NBS models forecasting above or below long term median NBS for 1-to 6-month forecast horizons (rows) for each lake (columns).
in the Supplemental Material Figs. S4-S6. In Fig. 6, observed cumulative NBS is not captured by the forecast range during periods of very low cumulative NBS (e.g., 2010) or periods of very high NBS (e.g., 2013 and 2014). This is not surprising, considering the limitations of using historical data to represent future uncertainty. For example, the use of climatological supplies does not accommodate the representation of initial conditions or forecast climatological conditions. On the whole, observed cumulative NBS falls within the range of the 5% and 95% values for the 1-, 3-, and 6-month forecasts 72%-84% of the time in the forecast analysis period, suggesting a need to refine the method for producing forecast uncertainty.

Water Level Forecasts
Although a large portion of the effort in predicting water levels is in forecasting the NBS, ultimately, forecasts of monthly mean water levels are the goal. Water level forecasts for Lake Superior resulting from the operator-selected NBS for three time horizons (1, 3, and 6 months) are shown in Fig. 7. Water level forecasts for Lakes Michigan-Huron and Erie can be found in the Supplemental Material Figs. S7 and S8. Over the first month, the water level error is generally within a few centimeters, but over 6 months, the water level forecast error resulting from cumulative error in NBS can accumulate to close to 30 cm. As with NBS forecasting error, there is a tendency to overpredict water levels during periods of low water supply and underpredict water levels during periods of high water supply.

Discussion and Recommendations
With the exception of GL-AHPS, which was fully integrated into regular operations in 2007, the water supply and water level forecast methods analyzed in this study were integrated into USACE operational procedures several decades ago. Although continual assessment of forecasts is conducted internally, this study represents the first comprehensive, peer-reviewed assessment of the NBS models used in USACE water level forecasts since the 1990s. As such, the analysis offers an opportunity to provide analysis-based recommendations to address some of the challenges described in the introduction, setting the stage for future improvements to the USACE water level forecasting protocol. Many of the challenges related to the transboundary nature of water management in the basin are addressesd through the use of coordinated data and models resulting from decades of ongoing efforts of the Coordinating Committee, so discussion presented here relates to resolving the influence of operator judgment on forecast skill, representation of initial conditions, incorporation of seasonal climate forecasts, and representation of uncertainty. Interestingly, in many cases, the operator selection process appeared to improve the NBS forecast skill. While this suggests value in careful consideration of basin conditions and climatic outlooks when interpreting and utilizing NBS forecasts, it is possible that the relative success of the operator-selected forecast is in part a result of the limited degree to which (1) initial conditions, and (2) seasonal meteorological forecasts are incorporated into the NBS models themselves.
An attempt to establish initial conditions (e.g., soil moisture of the lake basin, snow water equivalent over the basin, or heat content of the lake) is made only by GL-AHPS, which simulates initial conditions during a historical period using observed meteorology. Previous research has shown that uncertainty in initial conditions has an influence on hydrological forecast error for lead times up to about 1 month (Haibin et al. 2009;Shukla and Lettenmaier 2011). Building on earlier work investigating the utility of data assimilation to improve the initial forecast model state for near term applications [see the review paper by Liu and Gupta (2007)], recent work has demonstrated the promise of data assimilation to update model states in longer-lead forecasts (e.g., Gibbs et al. 2018;Huang et al. 2017).
At forecast lead times longer than 1 month, Shukla and Lettenmaier (2011) found that improvement in climate forecast skill has a greater influence on forecast error. Seasonal climatic forecasts are applied only in the Regression model and the GL-AHPS model NBS forecasts. In the Regression model, the basin-averaged median NMME prediction for precipitation and temperature for the coming month is applied as an independent variable. GL-AHPS, which is the only physically based NBS modeling system used in the USACE forecast, does take into account the climatic conditions in the forecast horizon by weighting an ensemble of historical meteorology according to climatic outlooks using methods described in Croley (2000). While this allows some influence of climatic forecasts, the methodology is limited by the fact that forecast meteorology is never outside of historical conditions. Additionally, in the current operational protocol, only the median GL-AHPS forecast is used. Future developments should improve the use of climatic forecasts as drivers of the NBS forecast models.
Finally, representation of uncertainty in the existing USACE water level forecast protocol is limited by the use of historical cumulative NBS to determine a forecast range, as was shown in Fig. 6. This limitation is linked to a lack of representation of initial conditions and the lack of consideration of nonstationarity. While using these socalled operational supplies may have been an appropriate approach to building uncertainty around deterministic forecasts in the past, major advancements in computation and ensemble prediction methods offer the potential for better representation of the true forecast uncertainty through ensemble prediction (Schaake et al. 2007). Lake Superior Water Level Forecasts (m) Fig. 7. Water-level forecasts for Lake Superior resulting from operator-selected NBS forecasts included in this analysis. Gray bars represent range of uncertainty in forecasts determined using operational 5% and 95% supplies.
In light of the advancements in seasonal hydrological prediction since the establishment of existing USACE Great Lakes water level forecasting procedures, future developments in operational NBS forecasts should improve long-lead forecasts by (1) integrating ensemble forecasts, including the full ensemble prediction from GL-AHPS (or similar modeling systems) into operational protocol, (2) investigating the potential for improving forecasts through data assimilation, (3) exploring the utility of seasonal meteorological predictions (like NMME) for further model development, and (4) improving the representation of forecast uncertainty.

Conclusions
This paper provides documentation of the existing operational procedures used by USACE-Detroit to predict monthly mean water levels over a 6-month lead time, presents results from a study to assess the skill of the NBS models used to drive the operational forecasts, and suggests improvements to the current methodology. The assessment showed that most NBS forecast models offer some skill, and the operator selection process to interpret and implement the NBS models appeared to result in some gain in forecast improvement over individual models. However, significant gaps were revealed, including, for example, the limited ability to represent initial conditions, influences of forecast meteorology, and the full range of uncertainty. The authors hope that this manuscript sets the stage for future improvements through the use of ensemble prediction, data assimilation, and better incorporation of seasonal meteorological predictions.

Data Availability Statement
The following data, models, or code generated or used during the study are available from the corresponding author by request.
• R Scripts generated to conduct analysis • Rdata files with aggregated NBS and water level forecasts and observations used in this analysis