Uncertainty analysis of a spatially explicit annual water-balance model : case study of the Cape Fear basin , North Carolina

There is an increasing demand for assessment of water provisioning ecosystem services. While simple models with low data and expertise requirements are attractive, their use as decision-aid tools should be supported by uncertainty characterization. We assessed the performance of the InVEST annual water yield model, a popular tool for ecosystem service assessment based on the Budyko hydrological framework. Our study involved the comparison of 10 subcatchments ranging in size and land-use configuration, in the Cape Fear basin, North Carolina. We analyzed the model sensitivity to climate variables and input parameters, and the structural error associated with the use of the Budyko framework, a lumped (catchment-scale) model theory, in a spatially explicit way. Comparison of model predictions with observations and with the lumped model predictions confirmed that the InVEST model is able to represent differences in land uses and therefore in the spatial distribution of water provisioning services. Our results emphasize the effect of climate input errors, especially annual precipitation, and errors in the ecohydrological parameter Z, which are both comparable to the model structure uncertainties. Our case study supports the use of the model for predicting land-use change effect on water provisioning, although its use for identifying areas of high water yield will be influenced by precipitation errors. While some results are context-specific, our study provides general insights and methods to help identify the regions and decision contexts where the model predictions may be used with confidence.


Introduction
The interactions between hydrology and land-use and landmanagement decisions have received increased attention in recent years.The International Association of Hydrological Sciences (IAHS) recently declared this decade Panta Rhei -everything flows -to focus on the changing dynamics of the water cycle in connection with changing human systems (Montanari et al., 2013).Socio-hydrology has recently been proposed as a "use-inspired" discipline to focus on understanding the human-modified water cycle (Sivapalan et al., 2014).The quantification of water services, or the value that humans derive from natural processes, is also increasingly seen a means of elucidating the interactions between people and water.Examples of this approach abound globally: through its Grain-to-Green program, China, incentivizes land owners to convert annual crops to perennial species or natural forests (Liu et al., 2008).In South America, there now exist dozens of water funds, which invest in upstream conservation measures to ensure the downstream provision of clean water (Martin-Ortega et al., 2013).In the United States, federal investments in water resources projects now require an assessment of impacts to ecosystem services (Council on Environmental Quality, 2013).
To quantify the impact of land-use and land-management decisions on ecosystem services, a number of tools have been developed by researchers and practitioners (Bagstad et al., 2013).Typical applications of these tools include the development of spatial planning policies, such as the delineation of priority areas for conservation or for agricultural development (Guswa et al., 2014).These applications often (i) occur in data-scarce environments; (ii) require spatially explicit information, at the scale of individual land holdings and parcels; and (iii) integrate a range of ecosystem services rather than focus on the precise quantification of a particular service.Accordingly, state-of-the-art models requiring extensive data and expertise are generally not appropriate for such applications.Instead, models for ecosystem-service valuation often focus on ease of use, using globally available data, accepting spatially explicit input and producing spatially explicit output, and limiting the model structure to key biophysical processes involved in land-use change (Guswa et al., 2014).
The InVEST annual water yield model was developed in line with this philosophy (Sharp et al., 2014).It includes a biophysical component, computing the provision of freshwater, or water yield, by different parts of the landscape, and a valuation component, representing the benefits of water provisioning to people.The biophysical module, the focus of this paper, is based on the Budyko theory, which has a long history and continues to receive interest in the hydrological literature (Budyko, 1961;Zhou et al., 2012;Zhang et al., 2001Zhang et al., , 2004;;Donohue et al., 2012;Xu et al., 2013;Wang and Tang, 2014).The InVEST model applies a oneparameter formulation of the theory (Zhang et al., 2004) in a spatially explicit way.This raises two issues.First, application of the model to ungauged basins or to future land-use scenarios requires a methodology for determining the value of the model parameter from known characteristics of the climate and basin, since it cannot be determined via calibration.Second, the Budyko approaches have been developed for long-term water balances at the catchment scale, rather than at the scale of individual land parcels, which is required for ecosystem-service decisions.The effect of this change in spatial scale is unclear, and calls for a rigorous analysis of the model and its uncertainties.
Uncertainty analyses remain rare or incomplete in ecosystem service assessments, where the focus is on analyzing trade-offs and valuation of multiple services, often at the expense of characterizing uncertainty of individual modeling components.For example, in reviewing the literature using the InVEST annual water yield model, we found the following common limitations: absence of or inadequate comparison with observed data, calibration of the model without prior identification of sensitive parameters, and lack of validation of the predictive capabilities in the context of land-use and land-cover (LULC) change (Bai et al., 2012;Su and Fu, 2013;Terrado et al., 2014;Nelson et al., 2010).To varying degrees, these limitations jeopardize the production of credible assessments of ecosystem services.
Recent work paved the way for understanding the uncertainties in the InVEST model predictions.Sánchez-Canales et al. (2012) analyzed the sensitivity of the model in their case study of the Llobregat catchment, in Spain.They found that the model was sensitive to climate variables, but less so to the Z parameter (see model description).Similarly, Boithias et al. (2014) and Terrado et al. (2014) reflect on the sensitivity of the model to climate inputs and calibrate the model based on the climate parameters and return flows.The conclusions of these studies are often context-specific and lack a quantitative estimate of the model's structural uncertainties.In particular, they assess the effect of the climate variables' uncertainty but do not examine the ability of the model to represent land-use change.
This paper aims to extend this work by characterizing the uncertainty in the InVEST annual water-yield model, and assess its utility to inform ecosystem-service decisions.As indicated above, the focus on water services implies a focus on decisions related to land use and land management, thus requiring spatially explicit descriptions of the landscape and associated hydrologic parameters (Guswa et al., 2014).Ecosystem-service decisions may be based on spatially aggregated output (e.g., which landscape scenario provides the greatest water yield at the base of the catchment), or may require spatially explicit output (e.g., which parcels in this catchment are of highest priority for conservation).While the proposed model is capable of providing output to inform the latter, this paper focuses on the former, since typical measurements of water yield (streamflow) are inherently aggregated.Using a case study in the Cape Fear region of North Carolina (NC), our study (i) quantifies the effect of parameter uncertainty on model outputs through sensitivity analyses, (ii) compares the distributed application of the water balance to the catchment-scale application, and (iii) quantifies the accuracy of calibrated and uncalibrated versions of the model by comparing model predictions to observations.From a practical standpoint, this work helps InVEST model users identify modeling uncertainties and proposes simple and replicable methods that can be used to quantify the reliability of water-service decisions.

Background theory
The Budyko curve is a unique empirical function that relates the ratio of actual evapotranspiration to precipitation (averaged over a catchment and over many years) to the ratio of potential evapotranspiration to precipitation (Budyko, 1961).The function is bounded by two limits -an energy limit in which actual evapotranspiration is equal to potential, and a water limit for which actual evapotranspiration is equal to precipitation.Due to spatial and temporal variability in climate forcing, the asynchronicity of water supply (P ) and demand (PET), the imperfect capacity of the root zone to buffer that asynchronicity, and lateral redistribution of water within the catchment, the Budyko curve lies below those two limits (Fig. 1).
To describe the degree to which long-term catchment water balances deviate from the parameterless Budyko curve, a number of scholars have proposed one-parameter functions that are similar (e.g., Fu, 1981;Choudhury, 1999;Zhang et al., 2004;Wang and Tang, 2014).The InVEST water yield model employs the formulation by Zhang et al. (2004), which incorporates a catchment parameter, ω: where AET is the actual evapotranspiration (mm), P is precipitation (mm), and PET is the potential evapotranspiration (mm).ω affects the partitioning of precipitation between evapotranspiration and runoff, and is a function of climate and physical factors.Larger values of ω indicate those basins that are more "efficient" in converting precipitation to transpiration, e.g., those with precipitation synchronous with PET and/or with deeper root zones.Gentine et al. (2012) and Troch et al. (2013) have shown that the natural coevolution of vegetation, climate, and topography may lead to basins for which the effects implicitly captured by ω counterbalance each other, offering an explanation for the observed convergence of data close to the original Budyko curve.The intent of the InVEST model, however, is to predict the effects of human-induced changes, i.e., to examine catchments for which natural coevolution is disrupted.

Model overview
To represent parcel-level changes to the landscape, InVEST represents explicitly the spatial variability in precipitation and PET, soil depth, and vegetation.The model is GIS-based, using rasters of climate and soil properties as inputs (see Sharp et al., 2014, for full details).
For vegetated land uses, InVEST applies the Zhang formulation in a spatially explicit way at the pixel scale (10-100 m on a side): In contrast to Eq. ( 1), P , PET, ω, and AET are all functions of the local position, indicated by the subscript i.
The parameter ω is further deconstructed to separate the effects of soil depth, rainfall frequency, and other factors, following an approach proposed by Donohue et al. (2012): where AWC i is the plant-available water content (depth), and Z is an empirical parameter.The constant, 1.25, in Eq. ( 2) reflects the minimum value of ω corresponding to bare soil, following Donohue et al. (2012).In this representation, differences in land use and land cover affect both PET, through changes to the crop factor, K c , and Z, through changes to the root depth and plant-available water content.
For open water, wetlands, and urban land uses, InVEST computes AET i directly as a user-defined proportion of PET i , with classical approaches such as the FAO 56 guidelines (Allen et al., 1998) or local knowledge used to determine the appropriate proportion (Sharp et al., 2014).The simple representation of these LULCs, compared to the vegetated land uses modeled with Eqs. ( 2) and (3), reflects the focus of the model on vegetation-dominated landscapes.
Total evapotranspiration from a catchment is computed as the sum of AET i attributed to each cell, and water yield is obtained by subtracting this value from the total precipitation.

Selection of the Z parameter
The empirical constant Z captures catchment-wide characteristics of climate seasonality, rainfall intensity, and topography that are not described by the plant-available water content (AWC) and annual P .Given the empirical nature of the model, the value of the Z parameter remains uncertain.In this work, we examine the three methods for the determination of Z that are proposed in the InVEST user's guide (Sharp et al., 2014).The first draws upon recent work that suggests that Z is positively correlated with the average annual number of rain events per year, N, and that Z may be approximated by N/5 (Donohue et al., 2012).This implies that Z captures rainfall patterns, distinguishing between catchments with similar annual precipitation but different intensity.The second method relies on globally available estimates of ω (Xu et al., 2013;Liang and Liu, 2014).Z is inferred from these published values of ω by inverting Eq. ( 2) with values of AWC and P averaged over the catchment.In the third method, Z is determined via calibration to streamflow data (see Sect. 2.5).

Methods
The goal of the InVEST model is not to reproduce observations with a high degree of accuracy and precision, but to provide reliable information to inform decisions.Therefore, utility or acceptability of the model should be couched in terms of relative uncertainty.That is, the uncertainty associated with the model (due to its simple structure or challenge of parameter identification) should be on par with or less than the irreducible predictive uncertainty that arises due to uncertainty in the forcing variables -in this case, precipitation and potential evapotranspiration.To assess the relative importance of the three sources of error (structural error, parameter selection, climate variables), we applied the InVEST annual model to 10 subcatchments in the Cape Fear basin, NC.Their colocation implies a similarity in climate and seasonality and facilitates a focus on variations in land use, size and topography (Hrachowitz et al., 2013).In the following sections, we describe the study area, the methods for the sensitivity analyses and uncertainty assessment of input parameters and forcing variables, and our approach to assess the structural error of the model: comparison with observations, and with the (classical) lumped model predictions.

Cape Fear study area
The Cape Fear catchment is a 23 600 km 2 area in North Carolina.Its major land uses are forest (40 %), wetland (15 %), grassland (14 %), and agriculture (12 %), mainly in the lower parts of the catchment and including intensive swine and poultry farms.Urban and agricultural development has generated significant groundwater extraction throughout the catchment.
The climate is humid subtropical, with a precipitation average of ∼ 1200 mm over the 2002-2012 study period (Table A1 in the Appendix).This period was used for the analyses based on the longest period available for climate data, observed streamflow, and matching LULC map.The available precipitation data comprise the PRISM data set (Parameter-elevation Regressions on Independent Slopes Model; Gilliland, 2003) and a network of eight rain gauges maintained by the USGS (USGS, 2014).For our analyses, we use the PRISM data and two additional rasters interpolated from the USGS point data (rain gauges) via spline and inverse-distance weighting (IDW).The three input rasters (hereafter referred to as PRISM, IDW, and spline) were used separately to compute the average precipitation over each of the 10 subcatchments and assess the error introduced by the input data selection.The variability in average annual precipitation among the PRISM, IDW, and spline rasters (averaging 1118, 975, and 966 mm, respectively; Table 1) represents the uncertainty that may arise when precipitation data are limited, a situation that is common in many places around the world (McGlynn et al., 2012).
Potential evapotranspiration is represented by reference evapotranspiration (ET 0 ) times a crop factor K c (Sharp et al., 2014).ET 0 was obtained from three sources: FAO data, representing a long-term average from 1961 to 1990 (FAO, 2001), MODIS (Moderate Resolution Imaging Spectroradiometer) data (Mu et al., 2012), and interpolation (IDW) from a network of 13 weather stations maintained by the Climate Office of North Carolina (hereafter referred to as Climate Office; NCSU, 2014).These three sources indicate average annual PET for the Cape Fear region to be 1240 mm (FAO), 1160 mm (MODIS), and 1310 mm (NCSU).These climate data indicate an aridity index (P / PET) of approximately 0.9 for the Cape Fear catchment.A summary of In-VEST inputs is given in the Appendix (Tables A1, A2).
Streamflow observations were obtained from the USGS monitoring network (USGS, 2014).A total of 10 stations with a minimum of 10 years of data were used for the analyses (Fig. 2, Table 2).Subcatchments draining to each of these points were delineated based on the 30 m DEM (digital elevation model).
Water withdrawal data were obtained from governmental agencies (NC Department of Environment and Natural Resources, 2014).Due to the lack of spatially explicit information for water withdrawals (reported by county, which do not follow the subcatchment boundaries), and on the magnitude of return flow, we represented their effect as homogeneous over the entire catchment.We think this decision has a limited effect on model testing since the value of water withdrawals is small compared to water yields (see Sect. 4).In addition, we explicitly accounted for this uncertainty by examining the effect of a 50 % error on the water withdrawala magnitude consistent with the variance among the county withdrawals.The average withdrawal rate, 39 mm yr −1 , was subtracted from the predicted water yields for comparison with observations.

Sensitivity to Z and K c
Step one in our assessment of the InVEST model was a local sensitivity analysis of water yield to the Z parameter and K c for forest -the dominant LU class.model to Z can also be interpreted as the sensitivity to AWC, when the raster values are varied homogeneously over the catchment, since these parameters play a similar role in the model structure (Eq.3).
As previously noted, large uncertainties surround the selection of the Z parameter (Sharp et al., 2014).For what we term the "baseline" case, we set Z equal to one-fifth of the number of rain days per year (Z = N/5).Based on historic precipitation data (SERCC, 2014), the average number of rain days per year is approximately 110, yielding a Z value of 22.We used this value as a baseline for all subcatchments, and allowed the parameter to vary between 1 and 30 for the sensitivity analyses.This range was estimated from Eq. (3) used with extreme values of P and AWC found in our catchments, and extreme values of ω (2.1 and 3.75) found in the study by Zhang et al. (2004).
Forest was the dominant LULC in all basins, with its cover ranging from 43 to 72 % of subcatchments.We therefore de-cided to use the crop factor K c -forest for the sensitivity analyses, and a baseline value of 1 for K c -forest was obtained from the FAO 56 guidelines (Allen et al., 1998).Uncertainties on this value are large since it remains difficult to provide accurate estimates of the actual evapotranspiration from forests (McMahon et al., 2013).We set the upper bound to 1.1, because values greater than this are unlikely (McMahon et al., 2013), and the lower bound to 0.7.
For the two parameters, we performed sensitivity analyses with the ranges defined above.The results are presented as a change in predicted water yield compared to the baseline run, thus assessing absolute sensitivity.Precipitation and reference evapotranspiration used for these runs were from the PRISM (1118 mm) and the FAO (1240 mm) data sets, respectively (see Sects.2.5 and 5 for insights into the error introduced by climate data). 1 In parentheses, we report the difference in corrected water yield (mm) between the baseline and calibrated runs (Z = 22, and Z = 14, respectively)

Sensitivity to climate inputs
To provide a context for the uncertainty in the predictions of water yield from the InVEST model, we compared the prediction error to the uncertainty in water yield that arises from uncertainty in climate (i.e., variability in the rasters of P and ET 0 ).Uncertainties in climatic data and their impact on rainfall-runoff models are commonly cited in the literature (McMahon et al., 2013;McGlynn et al., 2012).To be an effective decision-support tool, errors attributed to model structure and parameter selection should be on par with or less than the irreducible error associated with uncertainty in the climate.
As illustrated in Table 1, the average precipitation differed significantly across subcatchments depending on the data source: the mean differences between the PRISM and USGS data sets with the spline or IDW interpolation methods, respectively, were −14 and −13 %.Catchment-bycatchment differences were more spatially heterogeneous with the spline method, with some subcatchments receiving less precipitation relative to the baseline (PRISM data set) and others receiving more.The reference evapotranspiration data also showed significant differences across sources, although the FAO and Climate Office sources showed good agreement.The MODIS values were on average 22 % higher than those from the other two sources (Table 1).Differences between the Climate Office and FAO data were also spatially variable, ranging from −8 to 5 % across catchments.
To assess the uncertainty in water yield due to variability in climate inputs (precipitation and reference evapotranspiration), we examined the sensitivity of the baseline model results to spatially homogeneous increases and decreases in climate forcing.We considered climate inputs that are 10 % greater and 10 % less than the baseline, applied uniformly across the landscape.

Comparison of spatially explicit and lumped models
Although the InVEST annual water yield model is based on the well-studied Budyko framework, it departs from its classical application by applying the partitioning model at the pixel scale.To our knowledge, the effect of the pixel-by-pixel calculation performed by InVEST has not been previously studied.In such an application, three issues arise related to lateral flows of water, the spatial variability in climate variables, and the covariance of climate and soil in the prediction of the parameter ω.
In the catchment-scale application of Budyko-type models, lateral inflows and outflows across the catchment boundary are presumed negligible, resulting in a simple water budget based on catchment precipitation, evapotranspiration, and water yield.This assumption will not hold for a parcel-based application of Eq. ( 2).Thus, error in the catchment-scale water balance will arise by ignoring the excess water generated Hydrol.Earth Syst.Sci., 19, 839-853, 2015 www.hydrol-earth-syst-sci.net/19/839/2015/ at one spot that is later evaporated at a downgradient location.Such explicit routing is not included in the InVEST model.Additionally, even if lateral flows are negligible, applying the nonlinear Budyko curve locally and aggregating the yield will lead to different results than applying Eq. ( 2) to average values of P and PET.The concave nature of the function indicates that application over a range of climates will produce an average water yield that is higher than what would be predicted if applied at the catchment scale (Fig. 1).
Finally, since local values of both available water content and precipitation combine to affect the local values of omega (Eq.3), average values of omega from the spatially explicit model will be different from what one would obtain if average values of AWC and P were used to compute an average value of omega.
To investigate these effects, we compared the model predictions to those obtained by applying the lumped model (Zhang et al., 2004) at the catchment scale.Application of the lumped model requires a value of ω, which we derived from Eq. ( 3) with average values of P and AWC and with Z set to the baseline value of 22, as would be done in a typical ungauged application.We thus obtained, for each subcatchment, an estimate of areal AET and water yields for the vegetated areas.AET for urban areas and wetlands was calculated separately, following the same method as InVEST, and total water yield was calculated as the area-weighted average of water yield from the vegetated and urban areas.

Testing the spatially explicit model with observed data
To quantify the accuracy and precision associated with the InVEST water yield model, we assessed model performance by comparison with observed data for each of the 10 subcatchments in the Cape Fear area.Our method aims to measure the aggregated value of water yields at the subcatchment scale, not to test whether the water yield predicted by each pixel is accurate.We measured performance with the model bias, i.e., the relative difference between predicted and observed water yields, and with the subcatchment ranking by water yields.The ability of the model to predict ranking is important for applications where prioritization of areas of low and high water yields is needed (Guswa et al., 2014).

Uncalibrated model
We first examined the performance of the model when Z was determined without calibration.We calculated Z both from the number of rain days and from a global value of ω, to evaluate the appropriateness of these recommended methods.In addition to assessing overall model performance, we also assessed the correlation between model performance and the proportion of forest in the catchment.These analyses aimed to identify a potential bias that may be corrected by modifying the LULC-specific K c .

Calibrated model
To separate the effects of error associated with model structure from error attributed to parameter estimation, we also determined the value of Z via calibration.We calibrated to individual subcatchments, identifying for each the Z value that resulted in zero error in the water yield.We examined the similarity of Z values across the 10 basins, allowing us to assess the robustness of the model structure since we expect Z to depend on larger-scale climate and geology and not on local-scale land use.We also considered the performance of the model with a single value of Z applied to all subcatchments, determined by minimizing the average bias for all basins.This allowed us to assess the uncertainty in prediction of water yield due to model structure, i.e., the inherent uncertainty to applying Eqs. ( 2) and (3) to different basins when the parameter, Z, is chosen by best fit for the entire region.

Results
In the baseline case, we applied Eqs. ( 2) and ( 3) in a spatially explicit way with a precipitation field from the PRISM data and potential evapotranspiration data from the FAO.The value of Z in Eq. ( 3) was set to 22, as mentioned above.In this baseline case, predicted water yields ranged from 163 to 322 mm across the 10 subcatchments.Results are presented in Table 2.

Sensitivity analyses
Water yield predictions are very sensitive to climate inputs.The sensitivity is higher for precipitation than ET 0 .Relative to the baseline case, a 10 % increase in precipitation resulted in a 30 % increase in water yield (Fig. 3), while the same increase in ET 0 resulted in a 15 % decrease in water yield.
In contrast to the climate variables, water yield is less sensitive to values of Z: for example, a change in Z from the baseline value of 22 to a value of 10 results in an increase in water yield of approximately 27 % (Fig. 3).However, given the large uncertainties in the Z parameter, potential errors in water yield can be large: for example, the water yield is 155 % higher when Z is set to 1, relative to the baseline case with Z = 22.The sensitivity to Z is catchment-specific, as expected, since its effect on yield is modulated by AWC and P , both of which are spatially variable.In addition, the relative sensitivity of water yield to Z decreased with increasing values of Z and increased with increasing values of the aridity index (PET /P , results not shown).
The model was found to be more sensitive to K c (Fig. 3) with a 30 % change in K c resulting in a 41 % change in the water yield.However, given the small range of K c values, the effect of parameter uncertainty on the water yield prediction is lower than for Z.

Comparison of spatially explicit and lumped models
Across the 10 subcatchments, the water yields predicted by the spatially explicit InVEST model were on average 10 % lower than the outputs from the lumped model.For 8 of the 10 catchments, the spatially explicit model predicted lower water yields than the lumped model, and differences ranged from −24 to 14 % (Table 3).The two catchments for which the lumped model predicted a lower water yield than the In-VEST model were the Morgan Creek and Cane Creek catchments, which have the highest proportions of forest and the lowest proportions of urbanized area across the 10 catchments (Table 2).
The ω values computed for the lumped model ranged from 4.29 to 6.25 across the 10 catchments.These values are in the higher range of the values obtained by Zhang et al. (2004), as discussed in Sect.5.2.

Uncalibrated model
Figure 4 shows the spatially output from the InVEST model.The figure is for illustrative purposes only; as indicated above, we aggregate the pixel values of water yield to the subcatchment scale to compare with observations.Such comparison is presented in Fig. 5a, where the Z parameter for the InVEST model is determined from the number of rain days (Z = 22).Open triangles represent results from the InVEST model.To contextualize the error, grey bars represent the uncertainty in predicted water yield due to a 10 % uncertainty in precipitation and black bars represent the uncertainty in water yield due to a 50 % uncertainty in water withdrawals.
The performance of the model for this baseline run is fair.Across all basins, predicted water yields range from 163 to 322 mm yr −1 versus an observed range of 177-368 mm yr −1 .The bias between predicted and observed values averages −16 % across the 10 subcatchments, ranging from −53 to −1 % (Table 3).This indicates that the model structure combined with this choice of Z leads to a systematic underestimation of water yield.With the exception of one catchment, the biases ranged from −25 to −1 %.The outlier with an error of −53 %, Rockfish catchment, is relatively small (237 km 2 ), and the observed water yield is also an outlier, being the highest in the data set (367 mm).This area is also characterized by sandy soils; the plant available water content averages 0.11, compared to values between 0.17 and 0.20 for the other subcatchments.This suggests that the catchment may exhibit a unique behavior, which we will highlight in the following analyses.
Figure 5b presents the ranking of catchments in terms of their observed and predicted water yields.Discarding the outlier catchment, the figure indicates that the model accurately predicts the high-and low-ranking catchments, while there is some dispersion in ranks for the five midrange water yields, which vary from 236 to 289 mm yr −1 .
When Z was determined from published values of ω, the average value across the 10 catchments was 6 (compared to 22 for the baseline case).Model performance was not satisfactory in this case and the model bias averaged 68 %.

Calibrated model
In the first approach to the calibration of Z, we determined the value for which the predicted water yield exactly matched the observations.In this case, values of Z range from 6 to 20 across the 10 catchments.Not including the Rockfish catchment, the range is narrower (10-20) and the average across the nine remaining catchments is 14.5.The narrow range of variability translates into relatively small changes in water yield -the average difference among the basins is 27 %.
In a second approach, we determined a single value of Z for all 10 catchments by minimizing the average bias.This gives a value of Z = 14, and the error in yield for all subcatchments ranges from −38 to 14 % with a median value of −3 %.Predicted water yields range from 183 to 336 mm yr −1 versus an observed range from 177 to 368 mm yr −1 .The open circles in Fig. 5a present predictions from the calibrated model of water yield versus the observed values.
Model bias is not correlated with forest cover (R 2 = 0.01), nor with any other LULC (Table 1).The absence of systematic bias suggests that K c values are in a realistic range, with no significant error due to LULC parameter selection.No significant bias was detected with regard to catchment size, suggesting that this characteristic did not systematically influence the model predictions either.

Sensitivity analyses
Variability in the Z parameter, which is linearly related to ω, results in a shift of the Budyko curve, which affects water yield predictions (Fig. 1).Our results in Cape Fear suggest that the sensitivity of water yield to Z is low compared to the climate inputs, and decreases for larger values of Z (Fig. 3).This is consistent with the lumped model for which the sensitivity to ω decreases with increasing values of ω (Fig. 1).Due to this low sensitivity, small errors in estimating Z are likely to have limited impact on the reliability of water yield predictions.In particular, we note that the range investigated in the study (from 1 to 30) is greater than the typical uncertainty associated with Z: irrespectively of the selection method, values smaller than 5 are unlikely.
The sensitivity to Z also provides a sense of the sensitivity to AWC, which is a function of the local ecohydrological properties: plant available water content, root depth and soil depth (cf.Sharp et al., 2014 for details).Examination of Eq. ( 3) suggests that a relative change in Z has the same effect as a relative change in these ecohydrological parameters.
The confidence interval for these physical parameters may be large but is reducible by measurements.
When analyzing the model sensitivity to K c , two things are to be considered.First, the K c value affects only the portion of the landscape covered with forest and this reduces its effect.Because total water yield is the sum of the yields from the different parts of the landscape, parameters affecting only a portion of the landscape will have a smaller effect.Second, it is worth noting that the K c coefficient directly affects PET for a given LULC, since the latter is the product of K c by ET 0 .Examining the sensitivity of the model to K c is therefore equivalent to a displacement along the Budyko curve, rather than a shift of this curve (Fig. 1).
In summary, the sensitivity analyses showed that, for expected and reasonable ranges of parameter variability, precipitation and potential evapotranspiration have the greatest influence on water yield.These are followed by Z and then K c .

Comparison of spatially explicit and lumped models
Comparison of the model predictions with the classical lumped model application suggests three insights.First, it provides a sense of the effect of the pixel-by-pixel application of the Budyko theory.Because of its nonlinear nature, the average response of Eq. ( 2) applied across the landscape in a spatially explicit way is not equivalent to the response of the function applied to the entire catchment, characterized by average parameters.Our results suggest that this discretization effect is not large for the Cape Fear subcatchments, with the difference between the lumped and explicit models ranging from −24 to +14 %.This range is consistent with the typical errors expected from the application of simple empirical models.This point can be illustrated by the performance of the lumped model when compared with the observations: bias ranges from −36 to 29 % (Table 3).It is worth noting that the larger, positive biases (> 22 %), i.e., when the lumped model largely overestimated observed water yields, were obtained for the two subcatchments that had > 25 % urban cover, and the three basins with the least urban cover (Cane Creek, Rockfish, and Morgan Creek) had the largest underestimates of water yield.These results suggest that the contribution from urban areas was overestimated by the simple model.
The second point is related to the first one, focusing on the observation that water yields predicted by the spatially explicit model were consistently less than those predicted by the lumped model.As stated in Sect.3.3 (Methods) , this difference can be expected from the differences in average climate values or average ω values, due to the nonlinearity in Eq. ( 2).In our case, the average ω values were high for the lumped model (ranging from 4.29 to 6.25).This indicates that the empirical expression for Z, developed for a lumped application (e.g., Donohue et al., 2012), may give values of P. Hamel and A. J. Guswa: Uncertainty analysis of a spatially explicit annual water-balance model  Z (and, therefore, ω) that are too large for our case study; this effect is emphasized when used in a spatially explicit model.Calibration of the model based on Z allows correcting this error in the empirical expression, although further studies would be necessary to gain insights into the extrapolation of the Z parameter to spatially explicit models like InVEST.
Finally, the good agreement between the InVEST model and the lumped model allows drawing from the large body of work investigating the performance of the latter model.For example, Zhou et al. (2012) report a bias of less than 20 % in a long-term study of 150 large basins worldwide; similarly, Zhang et al. (2004) report a mean absolute error of < 60 mm in their study of over 470 catchments worldwide, corresponding to a bias of < 10 % for the majority of the catchments.Other local examples may be drawn by users to understand how the Budyko theory may apply locally (e.g., Yang et al., 2007, in China).Overall, there is a large ongoing effort to improve the parameterization and predictive use of the Budyko framework (Donohue et al., 2012;Xu et al., 2013;Liang and Liu, 2014).Future progress may therefore be used to refine the InVEST model interpretation in differ-  (Thompson et al., 2011).We note, however, that the agreement between the lumped model and the catchment model is context specific.As illustrated in Table 2, the differences between the lumped model and the InVEST model vary among the catchments, such that extrapolation of the results from such studies will need to be done cautiously.

Calibrated model
Our results indicate a fair performance of the calibrated model for multiple catchments ranging in size and LULC.
The bias ranged from −38 to 14 % for all subcatchments, and from −14 to 14 % when discarding the Rockfish catchment.This narrow range suggests that the calibrated model was able to explain the variability in observed water yields.
While it is possible that such variability is explained by climate more than LULC, this is not the case in Cape Fear since the average values of P and PET varied by less than 3 % between subcatchments (Table 2).Further consideration of the Z values obtained by calibrating it for each subcatchment provides insights into the interpretation of this parameter.With the exception of the Rockfish catchment, a value between 10 and 20 is able to characterize the nine other subcatchments.This suggests that the parameter captures the topography and climate of the area, as intended by the model.The calibrated value of Z for the Rockfish catchment was much lower (Z = 6), producing a higher water yield.This difference could be due to the inadequacy of Eq. (3) to relate ω to soil characteristics (since the soils in the Rockfish catchment are particularly sandy).It could also be attributed to errors in the treatment of water withdrawals and return flows, especially since the entire subcatchment lies within Hoke County, which has minimal water withdrawals.
Despite the uncertainties around the outlier, the multicatchment analyses allowed us to assess the model performance in representing LULC change.Use of the model for evaluation of LULC change is crucial in ecosystem service assessments, where scenario analyses of LULC development are common (Guswa et al., 2014).Validating the use of models in such contexts is extremely challenging since it is rare for modelers to have sufficient pre-and post-LULC-change data (Hrachowitz et al., 2013).In our study, the length of the precipitation and streamflow data did not allow conducting such temporal analyses.Regional analyses where space is substituted for time thus represent a powerful way to assess the ability of the model to capture differences in LULC configuration.

Uncalibrated model
Another important lesson from the analyses is that the calibrated Z value is relatively close to the baseline value, which was derived independently from the average annual number of rain events.Based on Fig. 3, using one value or the other would result in a difference in water yield of approximately 10 %.This error is small compared to other model uncertainties, suggesting that this method for determining Z is robust.The underprediction of water yield for ungauged catchments could be explained by errors in the precipitation raster, the Z parameter, and the treatment of water withdrawals.Based on Eq. ( 2), the negative bias implies the underestimation of the precipitation data or overestimation of the Z coefficient.As already noted, errors in precipitation data are difficult to characterize.However, precipitation was more likely underestimated in this study since it did not include snowfall.
Conversely, the method relying on a constant ω value was not found satisfactory for this case study, since it resulted in large overestimation of the water yields.Using ω = 4, the Z value found for individual subcatchments ranged from 4 to 8, averaging 6, a value that results in a large model bias (averaging 68 %).
With regard to relative water yield values, the model was able to predict subcatchment ranks fairly accurately (Fig. 4b), which means that priority areas would be correctly identified.The uncertainties in ranking for medium water yield catchments (ranking from 3 to 6) could be partly explained by their similarity (observed water yields range from 236 to 278 mm) and the uncertainty in the water abstraction, as suggested by the overlapping error bars in Fig. 4a.Interestingly, although these results were obtained with the calibrated value of Z, they are only slightly sensitive to the value of Z, since the ranking of subcatchments is largely maintained when the value of Z changes.The ranking of subcatchments based on the baseline run, for example, was identical to the one with Z = 14.

Practical implications
In this final section, we discuss the results with a focus on practical implications for model users.
Our analyses suggest that the uncertainty introduced by "variability in the precipitation inputs" is high, comparable or higher than the uncertainty introduced by the parameter Z and the use of the lumped model theory on a pixel-bypixel basis.Importantly, the sensitivity observed in Cape Fear (e.g., that a 10 % change in precipitation may result in a 30 % change in water yield) is specific to the climate; for example, in arid climates where evapotranspiration is water limited, an error in precipitation may have a lower effect on water yield since the precipitation surplus or deficit will be mostly converted to evapotranspiration.In Cape Fear, comparison of three climate input data sources suggested that large errors may occur when using point data or data sets obtained P. Hamel and A. J. Guswa: Uncertainty analysis of a spatially explicit annual water-balance model with different modeling assumptions.These results confirm a wide body of research that highlights the importance of precipitation inputs for rainfall runoff models (Zhou et al., 2012;McGlynn et al., 2012) and in particular for the InVEST model (Boithias et al., 2014;Sánchez-Canales et al., 2012).Although it was outside the scope of this study to investigate which climate data sets are less prone to errors, our results also draw attention to spatially heterogeneous errors.If model users are interested in the relative ranking of subcatchments, the spatial distribution of errors should be specifically investigated (e.g., probability of a systematic bias in mountainous areas).
The model is not very sensitive to "uncertainty in Z" over a modest range (e.g., [14][15][16][17][18][19][20][21][22].This is consistent with the conclusions from Sánchez-Canales et al. ( 2012), who report a low sensitivity to Z in a Mediterranean catchment, for which Z varied between 7 and 9. Since the viable range of Z is quite wide, however, it is possible that large uncertainties in that parameter will translate to significant uncertainty in water yield (Fig. 3).Our analyses provided further insights into the methods for Z selection and highlighted that the sensitivity of the model to Z decreased with increasing values of Z. Based on the examination of Eq. ( 2), this property will apply generally.Therefore, in temperate climates where values of Z are high (based on the interpretation of Z as the number of annual rain events), the model outputs are likely to be less sensitive to this parameter.
Our study also presented a method to detect a "bias related to the LULC parameters" when multiple observations are available in a catchment.Because K c values are LULCspecific, the correlation between model performance and K c values can be used to identify a possible error in the parameter and rectify the values accordingly.No bias was found in this study, bringing confidence in the ability of the model to capture the differences in LULC.We note that these correlation analyses rely on nested subcatchments that are not independent from each other, which decreases the significance of the relationship: five subcatchments are independent, while the other five partially overlap.However, proportions of forest cover varied widely between all subcatchments (from 43 to 72 %), which justifies our interpretation of the analyses.
We conclude this section with a perspective on the model performance assessment, highlighting key limitations in the calibration/testing exercise.First, we note that some water transfers are missing in the model, including irrigation and water abstraction.The model represents agriculture in the same way that it does natural vegetation, and irrigation is not included explicitly.Second, in the Cape Fear catchment, the magnitudes of the water withdrawals are small but this aspect of the modeling may be improved in future applications.In particular, distinction between uses of groundwater (crop irrigation or drinking water) are necessary to account for the fate of water extraction: evapotranspiration in the case of irrigation water, or return flow to the river in the case of drinking water (e.g., Terrado et al., 2014).Additionally, performance was evaluated at the catchment scale.A potential benefit of a spatially explicit model, however, is the ability to predict patterns of water yield within a basin.To properly evaluate that capability, further work should focus on comparing the InVEST model to more sophisticated fully distributed models.

Conclusion
Our study aimed to assess the performance of the InVEST annual water yield, a tool that is gaining interest in the ecosystem services community.While such simple models with low requirements for data and level of expertise are needed for practical applications, greater attention should be paid to characterizing the modeling uncertainties.Our assessment of the potential input errors, sensitivity analyses and comparison with observations in the Cape Fear catchment add to this body of work.Key results of the analyses are as follows.
-In the Cape Fear catchment, the InVEST model was most sensitive to uncertainty in the precipitation forcing.
-Errors in climate input data may be significant and nonspatially homogeneous, resulting in uncertainties in the assessment of zones of high and low water yields.
-The study supports the recommendations for setting the Z parameter based on the number of rain events, or via calibration with available observed data.
-Based on the average bias and the explained variance in water yield among the subcatchments, the model performance was fair to high, suggesting that the effects of land use and land cover are adequately captured by the model.
-The errors potentially introduced by a pixel-level application of the Budyko theory will depend on catchment configuration; in Cape Fear, they remained small, comparable to the climate and parameter errors of the empirical model.
-Water abstractions and irrigation processes that are not represented in simple models may confuse the calibration exercise, especially in data-scarce environments where the ecosystem services approach is gaining momentum.
Rigorous uncertainty analyses have not been the norm in the ecosystem service community; however, such work is essential to help users interpret models correctly to inform land-management decisions appropriately.

Figure 1 .
Figure 1.Original Budyko curve (B) and its variations used in the lumped model (Eq.1), shown for ω values of 2, 4, and 6.Grey lines represent the energy and water limits.Arrows illustrate the effect of a change in the climate forcing (thick arrows) and a change in the ω parameter, a function of Z, precipitation, and soil and vegetation properties (thin arrow, see Eq. (3) for details).

Figure 2 .
Figure 2. Cape Fear catchment showing locations of the stream gauges and subcatchments used in the study.The Rockfish catchment, discussed in the text, is indicated as "R".

Figure 3 .
Figure 3. Sensitivity of the water yield output to the Z coefficient and crop coefficient for forest LULC (K c ). Changes are relative to the baseline run (where Z = 22 and K c = 1).On the left-hand-side plot, absolute Z values are plotted on the x axis to facilitate the discussion on the Z coefficient.Each curve represents a subcatchment.

Figure 4 .
Figure 4. Spatially explicit output of the InVEST model, showing the water yield computed on a pixel scale.Model outputs are aggregated at the subcatchment scale, delineated by black lines, to be compared with observations at the gauging stations (green circles).

Figure 5 .
Figure 5. (a) Comparison between modeled yields (corrected for water withdrawal) and observed yields, both for the baseline run (Z = 22), and the calibrated run (Z = 14).Black error bars represent the uncertainty on the value for water withdrawal, while grey bars represent a 10 % error in the precipitation input.(b) Comparison of subcatchment ranks.The outlier (Rockfish) subcatchment is noted as "R" on each figure (see text for details).

Table 1 .
Precipitation and evapotranspiration in Cape Fear according to different data sources.Mean and standard deviation values are obtained from the 10 subcatchments.The relative difference between baseline data (i.e., PRISM and FAO sources, respectively, for P and ET 0 ) and the alternative data sources is given as the mean and the range for the 10 subcatchments.

Table 2 .
Summary of mean flow, precipitation, reference evapotranspiration, and land-use characteristics of the 10 study subcatchments.LULC classes shrubland, swine farm, open water and barren represented ≤ 2 % and are not reported here.Predicted mean flow values are results from the InVEST model with Z set to 22 (the difference with the calibrated run, with Z = 14, is shown in parentheses).P and ET 0 are precipitation and reference evapotranspiration, respectively.

Table 3 .
Bias between the water yields obtained from the InVEST model (baseline value Z =22), the lumped model, and observed data.The average, minimum, and maximum bias values for all the subcatchments are reported.Note that comparison with observations discards the Rockfish subcatchment which was identified as an outlier (see text for details).

Hydrol. Earth Syst. Sci., 19, 839-853, 2015 www.hydrol-earth-syst-sci.net/19/839/2015/ ent
geographic contexts.In particular, ongoing research on linking patch-scale and catchment-scale hydrology should provide critical insights into the effect of the simple spatial disaggregation used in the InVEST model