Irrigation, damming, and streamﬂow ﬂuctuations of the Yellow River

. The streamﬂows of the Yellow River (YR) is strongly affected by human activities like irrigation and dam operation. Many attribution studies focused on the long-term trends of streamﬂows, yet the contributions of these anthropogenic factors to streamﬂow ﬂuctuations have not been well quantiﬁed with fully mechanistic models. This study aims to 1) demonstrate whether the mechanistic global land surface model ORCHIDEE is able to simulate the streamﬂows of this complex rivers with human activities using a generic parameterization for human activities, and 2) preliminarily quantify the roles of irrigation and 5 dam operation in monthly streamﬂow ﬂuctuations of the YR from 1982 to 2014 with a newly developed irrigation module, and an ofﬂine dam operation model. Validations with observed streamﬂows near the outlet of the YR demonstrated that model performances improved notably with incrementally

highlighted in this study. Related processes should be integrated in models to better project future YR water resources under climate change and optimize adaption strategies.

Introduction
More than 60% of all rivers in the world are disturbed by human activities (Grill et al., 2019) contributing altogether to approximately 63% of surface water withdrawal (Hanasaki et al., 2018). River water is used for agriculture, industry, drinking 20 water supply, and electricity generation (Hanasaki et al., 2018;Wada et al., 2014), these usages being influenced by direct anthropogenic drivers and by climate change (Haddeland et al., 2014;Piao et al., 2007Piao et al., , 2010Yin et al., 2020;Zhou et al., 2020). In order to meet the fast-growing water demand in populated areas and to control floods (Wada et al., 2014) reservoirs have been built up for regulating the temporal distribution of river water (Biemans et al., 2011;Hanasaki et al., 2006) leading to a massive perturbation of the seasonality and year-to-year variations of streamflows. In the mid-northern latitudes regions 25 where a decrease of rainfall is observed historically and projected by climate models (Intergovernmental Panel on Climate Change, 2014), water scarcity will be further exacerbated by the growth of water demand (Hanasaki et al., 2013) and by the occurrence of more frequent extreme droughts (Seneviratne et al., 2014;Sherwood and Fu, 2014;Zscheischler et al., 2018). Thus, adapting river management is a crucial question for sustainable development, which requires comprehensive understanding of the impacts of human activities on river flow dynamics particularly in regions under high water stress (Liu 30 et al., 2017;Wada et al., 2016).
The Yellow River (YR) is the second longest river in China. It flows across arid, semi-arid, and semi-humid regions, and the catchment contains intensive agricultural zones and has a population of 107 million inhabitants (Piao et al., 2010). With 2.6% of total water resources in China, the Yellow River Basin (YRB) irrigates 9.7% of the croplands (http://www.yrcc.gov.cn).
Underground water resources are used in the YRB, but they only accounts for 10.3% of total water resources, outlining the 35 importance of streamflow water for regional water use. A special feature of the YRB is the huge spatio-temporal variation of its water balance. Precipitation concentrates in the flooding season (from July to October) which concentrates~60% of the annual discharge, whereas the dry season (from March to June) represents only~10-20%. Numerous dams have been built up to regulate the streamflows intra-and inter-annually in order to control floods and alleviate water scarcity (Liu et al., 2015;Zhuo et al., 2019). The YRB streamflows are thus highly controlled by human water withdrawals and dam operations, making 40 it difficult to separate the impacts of human and natural factors on the variability and trends.
To further elucidate the mechanisms, physical-based land surface hydrology models including natural and anthropogenic factors are required. Many previous model studies only considered natural processes and YRB simulations were evaluated against 45 naturalized streamflows (Liu et al., 2020;Xi et al., 2018;Yuan et al., 2018;Zhang and Yuan, 2020). YRB modeling studies simulating real streamflows and comparing their values to observed streamflows are scarce, the most important being from Jia et al. (2006); Tang et al. (2008). Yet, Jia et al. (2006) prescribed census irrigation and dam operation data as input of their model. Tang et al. (2008) included irrigation as a mechanism in their DBH model and investigated the long-term trends of streamflows, but they described the irrigation demand simply from satellite leaf area data, so that crop plant water requirements 50 and phenology were not represented by physical laws. Several global hydrological models (GHMs) simulated both irrigation and dam operation processes, and were applied for future projection of water resources regionally (Liu et al., 2019) or globally (Hanasaki et al., 2018;Wada et al., 2014Wada et al., , 2016. Those global GHM studies acknowledged the complex situation of the YRB where models' performances are limited, but none has focused on the sources of error or potential overlooked mechanisms in this catchment. 55 To model present water resources in the YRB and make future projections, not only natural mechanisms, but also anthropogenic ones must be represented in a model. If a key mechanism is missing in a model, a calibration of its parameters to match observations can compensate for structural biases and projections may be erroneous. For example, the HBV model (Hydrologiska Byråns Vattenbalansavdelning) was well-calibrated with different approaches in 156 catchments in Austria but failed in predicting streamflow changes due to climate warming (Duethmann et al., 2020). One of the key reasons being that 60 the response of vegetation to climate change was missing in the model. In this study, we integrate two key anthropogenic processes (irrigation and dam operation) in the land surface model ORCHIDEE (ORganizing Carbon and Hydrology in Dynamic EcosystEms) which has a mechanistic description of plant-climate and soil water availability interactions and of river streamflows. Through a set of simulations with generic parameter values, we aim to preliminarily diagnose how irrigation and dam operation improve the simulations of observed YRB streamflows. After making sure we understand the impact of adding 65 these two new and crucial processes, the model will be calibrated against a suite of observations so that it can be applied for future projections.
Using a standard version of ORCHIDEE without irrigation nor dams, Xi et al. (2018) performed simulations with a 0.1 • hypo-resolution atmospheric forcing over China (Chen et al., 2011). They attributed the trends of several river streamflows to natural drivers from increased CO 2 and climate change and to land use change. Lacking irrigation and other human removals, 70 their simulated results were higher than the observed streamflows for the YRB. By developing a crop module in ORCHIDEE Wang, 2016;Wu et al., 2016), ORCHIDEE were able to provide precise estimation of crop physiology, phenology, and yield at both local and national scales as well as other site-based crop models (e.g., EPICs (Folberth et al., 2012;Izaurralde et al., 2006;Liu et al., 2007Liu et al., , 2016Williams, 1995), CGMS-WOFOST (de Wit and van Diepen, 2008), APSIM (Elliott et al., 2014;Keating et al., 2003), and DSSAT (Jones et al., 2003)) and land surface models (e.g., CLM-CROP 75 (Drewniak et al., 2013), LPJ-GUESS (Smith et al., 2001;Lindeskog et al., 2013), LPJmL (Waha et al., 2012;Bondeau et al., 2007), and PEGASUS (Deryng et al., 2011 ) Müller et al., 2017). ORCHIDEE-estimated irrigation accounts for potential ecological and hydrological impacts (e.g., physiological response of plants to climate change and short term drought episodes on soil hydrology) with respect to other land surface models (LSMs) and GHMs (Hanasaki et al., 2008;Leng et al., 2015;Thiery et al., 2017;Nazemi and Wheater, 2015;Voisin et al., 2013). In a study focusing on China (Yin et al.,80 2020), ORCHIDEE was able to simulate irrigation withdrawals across China and to evaluate them against census data with a provincial-based spatial correlations of~0.68. It successfully explained the decline of total water storage in the YRB. In this study, we add a simple module describing the dam operations to further improve the model over the YRB.
A simple dam operation model is developed and firstly coupled to ORCHIDEE to simulate the real streamflows in this study.
Similar to other GHMs and LSMs, our dam operation model is based on generic operation principles due to lack of related 85 data. Recent dam models are developed from different perspectives, such as agent-based model River Wave (Humphries et al., global hydrological studies (e.g., Droppers et al. (2020); Haddeland et al. (2006Haddeland et al. ( , 2014; Hanasaki et al. (2018); Zhao et al. (2016); Yassin et al. (2019); Wada et al. (2014Wada et al. ( , 2016) are based on the ideas of Hanasaki et al. (2006). They categorized dams 90 based on their regulation purposes (irrigation and non-irrigation). Irrigation-oriented rules adjust the dam retention to meet the irrigation demand downstream, while non-irrigation-oriented rules buffer floods and thus dampen the variability (Hanasaki et al., 2006). However, the water release target of a dam in the model of Hanasaki et al. (2006) is fixed at the beginning of the year and cannot adjust interactively to large intra-and inter-annual climate variations, which is a key feature of the YRB.
To overcome this limitation, we propose a new dam operation model based on a targeted operation plan, constrained by the 95 regulation capacity of a dam and historical simulated streamflows, with flexibility to adjust to climate variation. The effects of dams on streamflows could then be studied with ORCHIDEE and isolated from the effect of climate factors and irrigation demands. Different from classical approaches separating the YRB into an upper, middle, and lower streams (Tang et al., 2008;Zhuo et al., 2019), we here further divide both the upper and middle streams into sub-catchments based on the locations of five key gauging stations (Fig. 1). This approach splits regions with and without big dams (or large irrigation areas) in the upper 100 and middle streams, which simplifies the assessment of the roles of irrigation and damming on streamflows.
In this study, ORCHIDEE with the novel crop-irrigation module (Wang, 2016;Yin et al., 2020) and the new dam operation model was applied in the YRB from 1982 to 2014 in order to: 1) demonstrate whether ORCHIDEE and the dam model, with generic parameterizations, are able to improve the simulation of streamflow fluctuations; and 2) attempt to separate the effect of irrigation and dams on the fluctuations of monthly streamflows. We first describe ORCHIDEE model and our new dam model 105 in Section 2.1. Then we present the algorithm used for estimating sub-catchment water balances in Section 2.2, followed by the input and evaluation datasets, the simulation protocol, and metrics for evaluation in Sections 2.3 to 2.5. Results are presented in Section 3 and limitations are discussed in Section 4.  Wu et al., 2016;Yin et al., 2020). The crop module includes specific parameterizations for wheat, maize, and rice, calibrated over China by observations (Wang, 2016;Wang et al., 2017). It is able to simulate crop carbon allocation, different phenological stages as well as related managements (e.g., planting date, rotation, multi-cropping, irrigation, etc).
Irrigation amount is simulated in the land surface model ORCHIDEE (Wang, 2016;Wang et al., 2017) as the minimum 120 between crop water requirements and water supply. The crop water requirements are defined according to the choice of an irrigation technique, namely minimizing soil moisture stress for flooding technique, sustaining plant potential evapotranspiration for dripping technique, and maintaining the water level above the soil surface during specific months for paddy irrigation technique. Each crop is grown on a specific soil column (in each model grid-cell) where the water and energy budgets are independently resolved. The water resources in the hydrological routing scheme are from three water reservoirs: 1) a streamflow 125 component; 2) a fast reservoir with surface runoff; and 3) a slow reservoir with deep drainage, used in this order for defining the priorities of water use for irrigation. As long-distance water transfer is not modeled, streams only supply water to the crops growing in the grid-cell they cross, according to the river routing scheme of the ORCHIDEE model (Ngo-Duc et al., 2007).
Without dams, irrigation can be underestimated where dams stores water to supply the crop demand. Transfer from reservoirs, lakes or local ponds to adjacent cells are not considered, which should further lead to an underestimation of the irrigation 130 supply, dependent on the cell size. Details of the coupled crop-irrigation module of ORCHIDEE are described in Yin et al. (2020).

New dam operation model
To account for the impacts of dam regulation on streamflow (Q) seasonality, we developed a dynamic dam water storage module based on only two generic rules: reducing flood peaks and guaranteeing baseflow. This model depends on simulated 135 inflows and is thus independent from irrigation demands. It has been developed for the main reservoirs of the YRB (e.g., the LongYangXia, LiuJiaXia, and XiaoLangDi in Fig. 1). Different from Biemans et al. (2011);Hanasaki et al. (2006), we primarily consider the ability of reservoirs in regulating river flow seasonality. This means that the targeted baseflow and flood control of our dam model are not fixed proportions of the mean annual streamflow, but depends on the regulation capacity of the reservoir (C max ). Firstly, similar to Voisin et al. (2013), a multi-year averaged monthly streamflow (Q s ) is calculated 140 based on ORCHIDEE simulations. To include the potential impacts of recent climate change on dam operation, here we only consider the latest past 10-year simulations, as: Here Q s,i [m 3 .s −1 ] is a multi-year averaged monthly streamflow of month i; j is a year index; N is number of year accounted; For a upcoming year j, we only use the historical simulations (maximum latest ten years) to calculate Q s .

145
Secondly, we evaluate the targeted water storage change ∆W t and monthly streamflow Q t considering the regulation capacity of each reservoir. As shown in Fig. S1, one year can be divided into two periods by comparing Q s withQ s . The longest continuous period of months with Q s >Q s is the recharging season for reservoirs, and the rest is the releasing season. The amount of water stored during the recharging season (blue region in Fig. S1) is determined by C max and is used during the Here k [-], varying between 0 and k max (=0.7), indicates the ability of reservoir in disturbing streamflow seasonality. It is a ratio of the maximum regulation capacity of the reservoir C max [10 8 m 3 ] over the streamflow amount throughout the recharging 155 season. α (0.0263) converts monthly streamflow to water volume. Assuming that the water storage of the reservoir reaches C max at the end of the recharging season, we can calculate targeted water storage W t by using ∆W t .
Finally, the variation of the actual water storage of the reservoir ∆W is a decision regarding actual monthly streamflow, current water storage, Q t , ∆W t , and W t . During the releasing season, ∆W is calculated as: It is the expected release amount to make river streamflows equal to the targeted streamflows after reservoir regulation. If current water storage is less than the targeted value (the case of Eq. 5a), the ∆W i is calculated by the W i with a proportion of ∆W t,i over W t,i . If the current storage is more than the targeted value (the cases of Eq. 5b and 5c), the reservoir can release more water based on a balance between the targeted water storage change ∆W t,i and the targeted water storage at the next time step W t,i (represented by ∆W i ). Note that all water storage change variables are negative 165 throughout the releasing season.
During the recharging season, we can calculate the ∆W i as: If current water storage is larger than the targeted value (Eq. 6a), we will try to recharge a volume of water to make W i+1 = W t,i+1 . If current water storage is less than the targeted value (Eq. 6b), we decide to recharge additional water volume besides 170 the ∆W t,i .
∆W is then applied as a correction of simulated streamflows to generate actual monthly streamflows using the following equation: model is a simplified representation of dam management, because it ignores the direct coupling between water demand and irrigation water supply from the cascade of upstream reservoirs. This approach implies that, with a regulated flow, demands will be able to be satisfied and floods to be avoided without being more explicit. A complete coupling of demand, flood, and reservoir management is difficult to implement in the land surface model in absence of data about the purpose and management strategy of each dam, given different possibly conflicting demand of water for industry and drinking versus cropland irrigation.

180
Before performing the simulation, we estimate the maximum regulation capacity of each studied reservoir in each river sub-catchment shown in Fig. 1. Table 1 lists collected information of the main reservoirs on the YR. Only large reservoirs like LongYangXia (LYX), LiuJiaXia (LJX), and XiaoLangDi (XLD) are considered in our model because of their huge C max . Figure 1 shows the YRB and main gauging stations used in this study. To effectively use Q obs for investigating impacts 185 of irrigation and dam regulations on the streamflows of different river sub-catchments, we divided the YRB into five sub- Fig. 1) with an outlet at each gauging station. Thus we can evaluate the water balance in R i by:

Sub-catchment diagnosis
respectively. In addition, q i = Q out,i − Q in,i indicates the contribution of R i to the river streamflows, that is the sub-catchment streamflows. This term can be negative if local water supply (e.g., precipitation and groundwater) cannot meet water demand.
A conceptual figure of the water balance of a sub-catchment is shown at the top left of Fig. 1.

Evaluation datasets 195
Observed monthly streamflows (Q obs ) from the gauging stations shown in Fig covariance towers measurements into a global 0.5 • monthly ET product. Since these towers do not cover irrigated systems, ET from irrigation simulated by the LPJml (Lund-Postam-Jena managed Land) is added to ET from non-irrigated systems. The PKU ET product estimates 0.5 • monthly ET by water balances at basin scale integrating FLUXNET observations to diagnose sub-basin patterns by a Multiple Tree Ensemble approach (Zeng et al., 2014).

Simulation protocol 210
The 0.5 • half-hourly GSWP3 atmospheric forcing (Kim, 2017) was used to drive ORCHIDEE simulations. Yin et al. (2018) used four atmospheric forcing to drive ORCHIDEE to simulate soil moisture dynamics over China and they found that the GSWP3 provided the best performances, hence we chose this forcing for this sutdy. Administration . Soil texture map is from Zobler (1986). Two simulation experiments were performed to assess the impacts of irrigation on streamflows: 1) NI: no irrigation; 2) IR: irrigated by available water resources. In IR, only surface irrigation is considered, that is water applied on the cropland soil without interception by canopies. The soil water stress, a function of soil moisture and crop root density up to 2 m depth , is checked every half an hour. When 220 it is less than a target threshold, irrigation is triggered with amount equal to the difference between saturated and current soil moisture. To precisely estimate irrigation water consumption (direct water loss from the surface water pool excluding return flow), deep drainages of the three crop soil columns is turned off in the IR simulation. Simulations start from a 20-year spin-up in 1982 to initialize the thermal and hydrological variables, then continued from 1982 to 2014. A validation against naturalized streamflows is shown in Table S1.

225
The dam operation simulation starts from 1982 as an offline model applied to the simulated streamflows from the IR simulation (Q IR ) as input. The initial values of W were set to half of the C max . Considering potential joint regulation of reservoirs, we firstly estimate the total ∆W of all considered reservoirs by using Q IR at HuaYuanKou (outlet of R 4 , Fig. 1). Then we estimate the ∆W of LYX and LJX reservoir by using Q IR at LanZhou. The difference between these two ∆W is assumed to be the ∆W of the XLD reservoir in-between. Offline simulated ∆W values are used to estimate regulated monthly streamflows (Q IR ) 230 as Eq. 7. As huge irrigation water withdrawals occur in R 3 and R 5 (YRCC, 1998-2014) the water recharge of reservoirs may result in negativeQ IR at TouDaoGuai and LiJin. To avoid this numerical artifact due to the offline nature of our dam model, we corrected all negativeQ IR to zero by assuming that the streamflows cannot further drop when all stream water is consumed upstream. The impact of this corrections are accounted for at gauging stations downstream to ensure mass conversation.

Evaluation metrics 235
Three metrics are used to evaluate the performances of simulated monthly Q. The mean-square error (MSE) evaluates the magnitude of errors between simulation and observations. It can be decomposed into three components (Kobayashi and Salam, Where S i and O i are simulated and observed values, respectively; n is the number of samples. SB (squared bias) is the bias 240 between simulated and observed values. In this study, SB represents the difference between simulated and observed multi-year mean annual Q. SDSD (the squared difference between standard deviation) relates to the mismatch of variation amplitudes between simulated and measured values. It can reflect whether our simulation can capture the seasonality of Q obs . LCS (the lack of correlation weighted by the standard deviation) indicates the mismatch of fluctuation patterns between simulated and observed values, which is equivalent to inter-annual variation of Q in this study. The formulas of these three components and 245 detailed explanation can be found in Kobayashi and Salam (2000).
The index of agreement (d ∈ [0, 1]) is defined as the ratio of MSE and potential error. It is calculated as: d = 1 indicates perfect fit, while d = 0 denotes poor agreement.
The modified Kling-Gupta Efficiency (mKGE ∈ (−∞, 1]) is defined as the Euclidean distance of three independent criteria: 250 correlation coefficient r, bias ratio β, and variability ratio γ (Gupta et al., 2009;Kling et al., 2012). It is an improved indicator from the Nash-Sutcliffe Efficiency avoiding heterogeneous sensitivities to peak and low flows, which is crucial for this study that is not only interested in simulating peak flows but also concentrates on base flows regulated by dams for human usage.
mKGE is calculated as, where r is the correlation coefficient between observed and simulated streamflows; µ [m 3 .s −1 ] and CV [-] are the mean and the coefficient of variation of Q, respectively. These indicators are used for three comparisons: 1) Q NI and Q obs ; 2) Q IR and Q obs ; 3)Q IR and Q obs .

Water budgets at sub-catchment scale
Figure 2 displays water budgets and trends in R i based on simulation and observations. Going from upstream to downstream, precipitation in P GSWP3 , which is consistent with P MSWEP , decreases from 543.6 mm.yr −1 (R 1 ) to 254.2 mm.yr −1 (R 3 ), and then rises again to 652.1 mm.yr −1 (R 5 ). The magnitudes of simulated ET (both ET NI and ET IR ) have no significant differences 265 with ET obs aggregated over sub-catchments R 1 to R 5 . Grid cell-based validation shows high agreement between simulated and observed ET across all sub-catchments. The lowest mean of correlation coefficients is 0.79 and the highest mean of relative RMSE is 4.9% (Table S2). Except for R 1 where cropland is rare, ET IR accounts for an amount representing more than 80% of P GSWP3 in the YRB, with a maximum value of 96.5% in R 3 . The difference between ET IR and ET NI is due to the irrigation process, which accounts for 9.1% and 8.2% of ET NI in R 3 and R 5 respectively as caused by the irrigation demand. The impact 270 of irrigation can be detected from sub-catchment streamflows (q i = (Q out,i − Q in,i )/A i ) as well. For instance, both q obs and q IR are negative in R 3 and R 5 , suggesting that local surface water resources cannot meet water demand for irrigation. As irrigation water transfers between grid cells are not represented in our simulations, the non-availability of water locally results in an underestimation of the irrigation withdrawals, likely explaining why q IR > q obs in R 3 to R 5 .
The trends of P and ET are positive but not significant in most R i during the period 1982-2014 (bottom panel of Fig. 2).

275
However, significant trends can be found in simulated and observed q in some R i . The decrease of q obs in R 1 is not captured by the model, neither in q NI nor q IR . This underestimated decrease of river streamflows might be linked to decreased glacier melt or increased non-irrigation human water withdrawals, which are ignored in our simulations. In R 2 and R 3 , the q obs trends are determined by the joint effects of climate change (e.g., the P increase) and human water withdrawals. The trends of q IR show the same direction as that of q obs . In R 5 however q obs increased by 1.67 mm.yr −1 , which was not captured by our simulation 280 of q IR . Besides the increase of P , another possible driver of increasing q obs in R 5 is a decrease of water withdrawal due to the improvement of irrigation efficiency , which is not accounted in our simulations. Moreover, the water use management may play an important role in the observed positive trends of q obs as well, with the aim to increase the streamflows at the downstream of the YR to avoid streamflow cutoff (Q obs < 1 m 3 .s −1 ) that occurred in 1990's .
Irrigation not only influences annual streamflows in the YR, but also affects its intra-annual variation. In general, the dis-285 charge yield Y Q , defined by the sum of surface runoff and drainage, of all grid cells in NI should be higher than in IR because our irrigation model can remove water from the stream reservoirs which is a fraction of drainage and runoff. However, our simulations show that Y Q,NI can be less than Y Q,IR (Fig. S2) at the beginning of the monsoon season. This is because irrigation keeps soil moisture higher than without irrigation in July in R 4 and R 5 ( Fig. S2d and S2e), which in turn promotes Y Q because the soil water holding capacity is lower and a larger fraction of P can go to runoff. This mechanism highlighted that irrigation 290 could enhance the heterogeneity of water temporal distribution and may reinforce floods after a dry season. Figure 3 shows time series of annual streamflows and of the seasonality of monthly streamflows. Our simulations underestimate Q obs at TangNaiHai in R 1 likely because we miss glacier melt. After LanZhou, the values of Q IR coincide very well with those of Q obs , indicating that irrigation strongly reduces the annual streamflows of the YR. However, the seasonality of monthly Q IR 295 is different from Q obs (Fig. 3f-3j). Despite the good match of annual values, the model without dams (shown in Fig. 3) produces an underestimation of Q in the dry season and an overestimation of Q in flood season. Such a mismatch of Q seasonality is likely caused primarily by dam regulation ignored in the model. The locations of several big reservoirs are shown in the bottom panel of Fig. 1 and their characteristics are listed in Table 1.

Comparison between observed and simulated Q
10 8 m 3 (green bar in Fig. 4b), the peaks of monthly Q NI at LanZhou were slightly lower than the peaks of Q obs in R 2 (Fig. 4b), as well as at TangNaiHai (Fig. 4a). But after the construction of the LongYangXia dam, modeled peak Q NI became systematically higher than the peak of Q obs each year, suggesting that the construction of this dam caused the observed peak reduction (Fig. 4b). Moreover, the seasonality of Q obs changed dramatically in the period , but no similar trend was found in monthly P (Fig. S3), suggesting that dam operation was the primary driver of the observed shift in seasonal streamflow 305 variations of the YRB from 1982 to 2014. Dams can affect inter-annual variations of Q as well, although less than the seasonal variation. For instance, TongGuan and XiaoLangDi are two consecutive gauging stations upstream and downstream the reservoir of XiaoLangDi in R 4 (Fig. 1). The annual Q obs at the two stations shows different features after the construction of the XiaoLangDi reservoir in 1999. The only exception occurs at LiJin, whereQ IR overestimates the streamflows from January to May. In fact, the water release from XLD during this period would be withdrawn for irrigation and industry in R 5 . However, our offline dam model is not able to simulate the interactions, leading to the overestimation.
The dam model is successful to reproduce flood control as well. At LanZhou, althoughQ IR underestimates the peak flow 315 due to the bias of the simulated mean annual streamflows (Fig. 3b), its seasonality is much smoother than that of Q IR . The underestimation ofQ IR can reflect special water management during extreme years. From 2000 to 2002, the YRB experienced severe droughts with 10~15% precipitation less than usual, leading to a decrease of surface water resource as much as 45% (Water Resources Bulletin of China, http://www.mwr.gov.cn/sj/tjgb/szygb/). To guarantee base flow, a set of policies were applied (e.g., reducing water withdrawn, increasing water price, releasing more water from reservoirs). Those policies are not 320 accounted for the model, which will produce a higher irrigation demand during dry years and promote the underestimation of the Q obs . From TouDaoGuai to LiJin, the floods from August to October are dramatically reduced by our dam model.
Nevertheless, the peaks are still overestimated inQ IR , which might be due to numerous non-modeled medium/small reservoirs that were ignored by our model, no less than 203 medium reservoirs were documented at the end of 2014 (YRCC, 1998(YRCC, -2014. In our simulation, a 326.5×10 8 m 3 regulation capacity is considered, which only accounts for 45% of the total storage 325 capacity of 720×10 8 m 3 (Ran and Lu, 2012). Moreover, in the five irrigation districts (http://www.yrcc.gov.cn/hhyl/yhgq/, (Tang et al., 2008)), special irrigation systems in the YRB, could contribute to flood reduction. For instance, the Hetao Plateau is the traditional irrigation district and is equipped with a hydraulic system that can divert river water into a complex irrigation Simulated ∆W in R 2 is compared to observations (Jin et al., 2017) in the left panel of Fig. 6, and the good agreement suggests that our dam model is able to capture the seasonal variation of ∆W (r = 0.9, p < 0.001) and rectify the simulated streamflows. In the case of XiaoLangDi in the right panel of Fig. 6, where the correlation is smaller (r = 0.34, p = 0.28), the mismatch could be explained by sediment regulation procedures of that dam, given that it releases a huge amount of water in 335 June for reservoir cleaning and sediment flushing downstream (Baoligao et al., 2016;Kong et al., 2017;Zhuo et al., 2019), a process not represented in our simple dam model. Moreover, because we ignored the buffering effect of numerous medium reservoirs, the simulated water recharge during the flood season could be overestimated. SB increases in IR consequently leading to higher MSE. This misfit is due to the underestimation of Q upstream (Fig. 3a).
Thus, modeled Q IR is lower at LanZhou which enlarges the SB. On the other hand, adding the dam operation contributes to improve the phase variations of Q which are dominated by the phase of the seasonal cycle, by reducing the SDSD error term. Nevertheless, the LCS error term indicating the magnitude of the variability, mainly at inter-annual time scales, has no significant improvement with the representation of irrigation and dam regulations. It is because some of reservoirs are able to 350 regulate Q inter-annually (Table 1), which can be observed from Fig. 4c. However, related operation rules are unclear and are not implemented in our dam model. Improvements were found in d as well, which demonstrates that the way human effects on Q of the YR were modeled brings more realistic results, despite ignoring the direct effect of irrigation demand on reservoir release, and ignored industrial and domestic water demands. The mKGE reveals significant increase after considering dam operations (Fig. 7d). Particularly at LanZhou and HuaYuanKou, the mKGE ofQ IR ∼ Q obs increases 0.86 and 1.11 than that 355 of Q IR ∼ Q obs , respectively. Note that the mKGEs of Q IR ∼ Q obs are smaller than that of Q NI ∼ Q obs from R 2 to R 4 , because irrigation decreases the mean annual streamflow of Q IR , which further increases the CV S leading to worse γ in mKGE (Eq. 11).

Discussion
This study shows that ORCHIDEE land surface model with crops, irrigation, and our simple dam operation model can reproduce correctly streamflow mean levels, inter-annual variations, and seasonal cycle in different sub-catchments of the YRB. 360 We preliminarily quantified the impacts of irrigation and dams on the fluctuations of streamflows. Simulated water balance components were compared to observations in different sub-catchments with a good agreement (e.g., 4.5 ± 6.9% for ET). We found that irrigation mainly affects the magnitude of annual streamflows by consuming 242.8 ± 27.8 × 10 8 m 3 .yr −1 of water, consistent with census data giving a consumption of 231.4 ± 31.6 × 10 8 m 3 .yr −1 (YRCC, 1998(YRCC, -2014. As the water of the YRB is reaching the limit of usage (Feng et al., 2016), we did not find any significant effect of irrigation on streamflow trends.
Instead of increasing river water withdrawals, the growing water demand appeared to have been balanced by improving water use efficiency during the study period Zhou et al., 2020). Our simulation reveals that the impact of irrigation on streamflows may even be positive under special situations, which was also shown in Kustu et al. (2011). However, our mechanisms are different from the irrigation-ET-precipitation atmospheric feedback mechanisms found by Kustu et al. (2011), we demonstrated that irrigation may significantly increase soil moisture and promote runoff yield during the following wet 370 season. It implies that irrigation in such landscapes may reinforce the magnitude of floods during the rainy season by a higher legacy soil moisture.
We found that dams strongly regulate the temporal variation of streamflows Yaghmaei et al., 2018). By including simple regulation rules depending on inflows, our dam model reduced by 48-77% the simulation error (MSE in Fig. 7), especially the variability component (SDSD) of the total error which is dominated by seasonal misfit 375 reduction from dams. Moreover, we confirmed that the change of Q obs seasonality during the study period is not due to climate change (Fig. S3), but is determined by dam operations . Big dams, like the LongYangXia, LiuJiaXia, and XiaoLangDi, are able to regulate streamflows inter-annually (Wang et al., 2018) and smooth the inter-annual distribution of water resources in YRB (Piao et al., 2010;Wang et al., 2006;YRCC, 1998YRCC, -2014. However, their detailed operation rules are unclear and were not implemented explicitly in our dam model. The error corresponding to inter-annual variation (LCS in land use change, irrigation techniques, water management techniques, and dams inter-connection), which are difficult to be well modeled due to lack of data. However, with the upcoming Surface Water and Ocean Topography (SWOT) mission, it will be possible to monitor the water level and surface extent of more reservoirs, which will be helpful to improve and validate the dam operation simulations (Ottlé et al., 2020).
Our simulations ignored potential impacts of reservoirs on local climate, e.g. through their evaporation (Degu et al., 2011).

390
The water area of several artificial reservoirs (LongYangXia, LiuJiaXia, BoHaiWan, SanShengGong, and XiaoLangDi) is approximately 1056 km 2 , which is larger than the 10th largest natural lake in China. These water bodies can also significant influence local energy budgets and evaporative water loss may be considerable especially in arid and semi-arid area (Friedrich et al., 2018;Shiklomanov, 1999 streamflows every year during the flood season. Its irrigation area is 5740 km 2 with an evapotranspiration rate ranging between 1200~1600 mm.yr −1 . However, as these irrigation districts divert river water without big dams, they are not taken into account in most YR studies. Another non-negligible factor in the case of YR is sedimentation, which reduces the regulation capacities of reservoirs and weakens streamflow regulation by human. For instance, the total capacity of QingTongXia declined from 6.06 to 0.4 × 10 8 m 3 since 1978 due to sedimentation. Therefore, how land use change and evolution of natural ecosystems affect 400 sediment load and deposition is another key factor to project dams disturbances on streamflows in the YRB. Simulating anthropogenic impact on river streamflows is challenging. In the case of the YR, well calibrated models can provide accurate naturalized streamflow simulations with Nash-Sutcliffe Efficiency (NSE) as high as 0.9 (Yuan et al., 2016).
However, when considering the impacts of irrigation and dams, the NSE values of simulations are generally worse. For instance, simulations with anthropogenic effects by Hanasaki et al. (2018) had lower NSE than the simulation with only natural 405 processes. Similarly, Wada et al. (2014) showed NSE decrease after considering anthropogenic factors in the YRB, which was interpreted as complexity of the YRB under the impacts of human activities and climate variation. However, the NSE of naturalized streamflows cannot really be compared to the one of regulated streamflows. Even if a model can perfectly simulate the dam operations, the NSE of naturalized streamflows will be larger than that of regulated streamflows, given that dam operations automatically reduce the variation of river streamflows (a simple proof is available in Sect. A of the online supplement). In Intensive calibrations using a suite of observations can allow catchment-scale studies to provide high-accurate simulated streamflows for short-term flood forecast. However, for long-term projections, a model should include all key processes of the 415 system studied. If key processes are missing in the model, a calibration will cover up the shortcomings, which lead to lack of predictive capacities for long time scales as shown by Duethmann et al. (2020). Therefore, by developing crop physiology and phenology, irrigation, and (offline) dam operation model, we have tried to demonstrate that streamflow fluctuations of the YR can be reasonable reproduced by a generic land surface model. Although mismatches exist in the simulated streamflows, they are more likely caused by missing processes (joint impact of multiple medium reservoirs, special mission of dams, irrigation 420 system characteristics) than by poor calibration of existing processes, because other simulated hydrological variables coincide well with observations in the YRB, such as soil moisture dynamics (Yin et al., 2018), naturalized river streamflows (Table S1 in Xi et al. (2018)), leaf area index (Section S2 in Xi et al. (2018)), amount and trend of irrigation withdrawals , trends of total water storage (Section 3.4 in Yin et al. (2020)), and ET (Table S2). On the contrary, these mismatches draw our attention to some key mechanisms overlooked in most models. For instance, our model underestimates the annual streamflow 425 at LanZhou in the period 2000-2002 (Fig. 3b), during whichQ IR was almost negatively correlated to the Q obs (Fig. 5a). In summary, our results show that the errors of simulated streamflows decreased dramatically after considering crops, irrigation, and dam operations, suggesting that these are first order mechanisms controlling streamflow fluctuations. Future work can be focused on completing the model by linking dam operation to the variable crop water demand. significant changes of precipitation patterns and land use during the study period, but by the construction of dams and their operation. After considering the impacts of dams, we found that dam regulation can explain about 48-77% of the fluctuations of streamflows. The effect of dams may be still underestimated because we only considered simple regulation rules based on inflows, but ignored its interactions with irrigation demand downstream. Moreover, our analysis showed that several reservoirs on the Yellow River are able to influence streamflows inter-annually. However, such effects are not quantified due to lack of 440 knowledge of the regulation rules across our study period.

Conclusions
Code and data availability. The code of ORCHIDEE can be assessed via https://forge.ipsl.jussieu.fr/orchidee/wiki. The data used in this study, and the code of the dam operation model, analysis, and plotting can be accessed via https://doi.org/10.5281/zenodo.3979053 (Yin, 2020 opment, 10, 1903-1925, https://doi.org/10.5194/gmd-10-1903-2017, http://www.geosci-model-dev-discuss.net/gmd-2016-162 Hanasaki et al. (2006). "H" indicates hydropower; "C" indicates flood control; "I" indicates irrigation; "W" indicates water supply; and "S" indicates scouring sediment. Inter-annual CSWIH * The total capacity shrink is due to sedimentation.  (c) TouDaoGuai q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (d) HuaYuanKou q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q     1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 2012 Year  Supporting information for "Irrigation, damming, and streamflow fluctuations of the Yellow River" Assuming that N i is the time series of natural discharge and ∆W i is water storage change of a reservoir. Thus, the regulated discharge R i can be calculated as: Where i is month index. Capital letters indicate observed variables; while lower case letters indicate simulated variables. Then the NSE of regulated discharge (NSE 1 ) can be calculated as: where M is the length of the time series. Let's assume that the model can give a perfect simulation of water storage change of reservoir. Thus ∆w i = ∆W i and NSE 1 is, Note that the NSE of natural discharge (NSE 2 ) is, The difference between NSE 1 and NSE 2 is the variation of regulated and natural discharge. As assuming that dam operations always reduce the variation of discharge, the variation of N i is smaller than R i . Consequently, NSE 2 is always less than NSE 1 .

15
In summary, if reservoirs reduce the variation of river discharge, a model with a perfect dam module will always provide a smaller NSE (with regulated discharge as reference) than that of the model without functions of dam operations (with natural discharge as reference)! The conclusion is that it is not comparable of model (study) performances with different references and that it is not adequate to evaluate dam parameterizations.  Table S1. Validation ORCHIDEE simulated naturalized streamflow against census-based naturalized monthly streamflow of the Yellow River (1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000). R 2 is coefficient of determination. mKGE is modified Kling-Gupta Efficiency. d is degree of agreement.