Consistency of satellite precipitation estimates in space and over time compared with gauge observations and snow-hydrological modelling in the Lake Titicaca region

This paper proposes a protocol to assess the space-time consistency of satellite precipitation estimates (SPEs) 10 according to various indicators including: (i) direct comparison of SPEs with 72 precipitation gauges; (ii) sensitivity of streamflow modelling to SPEs at the outlet of four basins; and (iii) the sensitivity of distributed snow models to SPEs using a MODIS snow product as reference in an unmonitored mountainous area. The protocol was applied successively to four different time windows (2000‒2004, 2004‒2008, 2008‒2012 and 2000‒2012) to account for the space-time variability of the SPEs and to a large dataset composed of 12 SPEs (CMORPH-RAW, CMORPH-CRT, CMORPH-BLD, CHIRP, CHIRPS, 15 GSMaP, MSWEP, PERSIANN, PERSIANN-CDR, TMPA-RT, TMPA-Adj and SM2Rain), an unprecedented comparison. The aim of using different space-time scales and indicators was to evaluate whether the efficiency of SPEs varies with the method of assessment, time window and location. Results revealed very high discrepancies between SPEs compared to precipitation gauge observations. Some SPEs (CMORPH‒RAW, CMORPH‒CRT, GSMaP, PERSIANN, TMPA‒RT and SM2Rain) are unable to estimate regional precipitation whereas the others (CHIRP, CHIRPS, CMORPH‒BLD, MSWEP, 20 PERSIANN‒CDR and TMPA‒Adj) produce a realistic representation despite recurrent spatial limitation over regions with contrasted emissivity, temperature and orography. In nine out of ten of the cases studied, streamflow was more realistically simulated by the hydrological model tested when SPEs were used as forcing precipitation data rather than precipitation derived from the available precipitation gauge networks. Interestingly, the potential of SPEs to reproduce the observed streamflow varied significantly depending on the basin and period considered and did not systematically corroborate SPE 25 potential compared with gauge precipitation observations. SPE’s ability to reproduce the duration of MODIS-based snow cover also showed variable consistency over time with poorer simulations in comparison to those simulated from available precipitation gauges. Using snow cover simulations as indicator led to a different efficiency ranking of the SPEs that the ones obtained when using observed gauge precipitation and streamflow. SPEs thus present space-time errors that may not be detected when short time windows and/or scarce gauge networks and/or single indicators are used, underlining how 30 important it is to carefully consider their space-time consistency before using them for hydro-climatic studies. Moreover SPE Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-316 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 13 September 2018 c © Author(s) 2018. CC BY 4.0 License.


Introduction 1.1 On the need for and difficulty involved in estimating precipitation fields
Water resources are facing unprecedented pressure due to the combined effects of population growth and climate change.In the 20 th century, water extraction underwent a six-fold increase to sustain food needs and economic levels due to the increasing world population (vision, water council, 2000).At the same time, global warming has led to the redistribution of precipitation, which has favoured the occurrence of both drought and extreme flood events (Trenberth, 2011).
As a key component of the hydrologic cycle, it is therefore crucial to have accurate precipitation estimates in many research fields including hydrological and snow modelling (e.g.Hublart et al., 2016), climate studies (e.g.Espinoza Villar et al., 2009), extreme flooding (e.g.Ovando et al., 2016) drought (e.g.Satgé et al., 2017a), and monitoring to understand past and ongoing changes and to optimise water resources management (e.g.Fabre et al., 2016Fabre et al., , 2015)).
Measurements of precipitation are usually retrieved from point gauge stations.Considered as ground truth at the point level, precipitation estimates are then spatialized to represent the distribution of precipitation in space and over time to be used as inputs for impact modelling.However, in most cases, the gauge network is too sparse and unevenly distributed to correctly capture the spatial variability of precipitation.This is especially true for remote regions such as tropical forests, mountainous areas and deserts where the usual insufficient installation and maintenance operations seriously compromise precipitation monitoring.An alternative approach consists in using precipitation derived from weather radar using the backscattering of electromagnetic waves via hydrometeors (e.g.Mahmoud et al., 2018).Unlike measurements using traditional gauges, this technique monitors large areas in a distributed way thus offering the opportunity to monitor precipitation over remote regions.However, ground radar measurements are rarely available at the global or even regional scale (Tang et al., 2016) and are limited in the case of complex terrain which interferes with the radar signal (Zeng et al., 2018).More recently, some authors (Messer et al., 2006;Overeem et al., 2011;Zinevich et al., 2008) reported on the possibility to estimate precipitation from wireless networks such as commercial cellular phone microwave links.These estimations are based on the attenuation of the electromagnetic signals between telecommunication antennas during precipitation events and their first results are promising as precipitation estimates were found to be very accurate (see e.g.Doumounia et al., 2014).However, this technique faces the problem of private cellular phone company policy about sharing data and telecommunication antenna are mainly located in urban areas, which limits accurate precipitation estimates in space.On the other hand, several satellite precipitation estimates (SPEs) are now available, making possible to monitor Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.precipitation on regular grids at the near global scale, representing an unprecedented opportunity to complement traditional precipitation measurements.

Satellite precipitation estimates (SPEs): opportunities and limitations
Several SPEs have become available in recent decades to monitor precipitation at global scale and on regular grids.The first generation of SPEs appeared with the Tropical Rainfall Measuring Mission (TRMM) launched in 1997 by NASA (National Aeronautics and Space Administration) and the Japan Aerospace Exploration Agency (JAXA).Over the last 18 years, the TRMM Multisatellite Precipitation Analysis (TMPA) (Huffman and Bolvin, 2014), the Climate prediction centre MORPHing (CMORPH) (Joyce et al., 2004), the Precipitation Estimation from remotely Sensed Information using Artificial Neural Networks (PERSIANN) (Sorooshian et al., 2000) and the Global Satellite Mapping Precipitation (GSMaP) (GSMaP, 2012) SPE datasets have been developed based on the TRMM mission to deliver precipitation estimates at the 0.25° grid scale.In 2014, the Global Precipitation Measurement (GPM) mission was launched to ensure TRMM continuity.The second generation of SPEs based on GPM missions included the Integrated Multi-SatellitE Retrievals for GPM (IMERG) (Huffman et al., 2017) and a new GSMaP version product which deliver precipitation estimates at a finer grid scale (0.1°) than the first generation of SPEs but estimates are limited to the period from 2014 to the present.At the same time, some SPEs took advantage of previous SPEs and missions to estimate precipitation over larger time window: long-term SPE generation.This is the case of PERSIANN-Climate Data Record (PERSIANN-CDR) (Ashouri et al., 2015), Multi-Source Weighted-Ensemble Precipitation (MSWEP) (Beck et al., 2017) and Climate Hazards Group InfraRed Precipitation (CHIRP) with Station data (CHIRPS) (Funk et al., 2015).
However, SPEs are indirect measurements made from satellite/sensor constellations, including passive microwaves (PMW) and infra-red (IR) sensors on board low earth orbital (LEO) and geosynchronous satellites and are subject to uncertainty due to technical limitations.Indeed, the irregular sampling and limited overpass of LEO PMW measurements impede the correct capture of short term and slight precipitation events (Gebregiorgis and Hossain, 2013;Tian et al., 2009) which can introduce error in precipitation estimates over arid regions and/or during the dry seasons (Prakash et al., 2014;Satgé et al., 2017a;Satgé et al., 2015;Shen et al., 2010).In mountainous regions, the precipitation /no precipitation cloud classification based on cloud top IR temperature may fail in the case of precipitation processes resulting from orographic warm clouds (Dinku et al., 2010;Gebregiorgis and Hossain, 2013;Hirpa et al., 2010).The contrast between temperature and emissivity (i.e.water and snow covered area) of rough land surfaces creates background signals similar to those produce by precipitation, leading to misinterpretation between rainy or not rainy clouds, which can introduce high bias in precipitation estimates (Satgé et al., 2016;Tian and Peters-Lidard, 2007;Ferraro et al. 1998;Hussain et al. 2016;Satgé et al., 2018).
In addition to these spatial inconsistencies, the orbital satellite context implies constantly varying input data for each observation time (snapshot), which likely introduces inhomogeneity in the SPE time records.This could be exacerbated with aging sensors and permanent sensor failures.As an example, the TRMM satellite mission ended on April 8, 2015, making unavailable the TRMM Microwave Imager (TMI) from input data used for TMPA retrieval (Huffman and Bolvin, 2016).
Consequently, the potential of SPEs is expected to present space and time errors whose quantification is crucial before their use for hydro-climatic studies.

State of the art evaluation of SPEs
In the context described above, many studies have reported on the efficiency of SPEs over different regions.The most common way to evaluate SPE potential is to compare their estimates with precipitation gauge measurements, as reviewed by Maggioni et al. (2016) and Sun et al. (2017).The comparison of gauge-based assessment studies confirmed the spatial variability of SPE efficiency in reproducing precipitation, so no single SPE can be said to be the most effective one at global scale.For example, when TMPA, CMORPH, and PERSIANN SPE datasets were compared, TMPA was found to be closer to the observed precipitation in India (Prakash et al., 2014), the Guyana shield (Ringard et al., 2015), Africa (Serrat-Capdevila et al., 2016), Chile (Zambrano-Bigiarini et al., 2017) and South America Andean plateau (Satgé et al., 2016), whereas CMORPH was closer to observed precipitation in Bali, Indonesia (Rahmawati and Lubczynski, 2017), Pakistan (Hussain et al., 2017), and China (Su et al., 2017;Zeng et al., 2018).However, these assessments based on comparison with gauge observations did not assess SPE's potential performance over unmonitored regions.This is especially true for high mountainous regions where available gauge networks (generally located in the valley) cannot correctly represent the local precipitation induced by topographic effects.As a result, evaluating SPE potential over high mountainous regions remains challenging (Hussain et al., 2017;Satgé et al., 2017b).
An alternative method consists in assessing the sensitivity of hydrological models to SPEs.The efficiency of SPEs can be evaluated indirectly via their ability to generate reasonable discharge simulations at the outlet of the basin concerned.
Compared to gauge-based assessment studies, fewer authors have reported on hydrological sensitivity to SPEs, as reviewed in (Maggioni and Massari, 2018).For example, TMPA, CMORPH and PERSIANN datasets were compared as forcing data for hydrological modelling in Africa (Casse et al., 2015;Thiemig et al., 2013;Tramblay et al., 2016) and South America (Zubieta et al., 2015).These studies provided complementary information to gauge-based assessments, offering an operational overview of SPEs for the management of water resources.However, due to the aggregation process at basin scale, the potential of SPEs over specific ungauged regions remains unclear.Moreover, in these studies, the SPEs were not compared with gauge observations (Thiemig et al., 2013) or provided only a brief comparison at basin (Casse et al., 2015;Tramblay et al., 2016) or gauge (Zubieta et al., 2015) scale.Therefore it is difficult to conclude on the respective advantages and limitations of using gauges or streamflow data as indicators to assess the ability of SPEs to reproduce precipitation patterns as SPEs could rank differently depending on the indicator used.For example, considering CMORPH, PERSIANN and TMPA datasets over two African watersheds, TMPA showed the closest estimate in comparison with gauges for both basins (Thiemig et al., 2012) while CMORPH and TMPA provided more accurate streamflow simulations depending on the basin considered (Thiemig et al., 2013).Whatever the selected approach (based on gauges and/or hydrological modelling), the analysis is performed using a single time window which does not assess the temporal variability of SPEs due to the acquisition process and/or aging sensors.To date, only a few studies have been conducted to observe changes in the efficiency of SPEs over time.For example, CHIRPS precipitation estimates were analysed over two distinct time windows to assess potential changes in precipitation accuracy over Cyprus and Nepal from one period to another (Katsanos et al., 2016;Shrestha et al., 2017).
Similarly, Bai et al., 2018 analysed the accuracy of CHIRPS in mainland China separately for each year to assess its interannual variability.These studies highlighted temporal inconsistencies in CHIRPS estimates inherent to variations in the input data used for precipitation retrieval.However, only the CHIRPS SPE was considered and similar features are to be expected with other SPEs.
Today more than 20 SPEs are available from the first (TRMM), second (GPM) and long-term SPE generation (Beck et al., 2017).Nevertheless previous studies only considered SPE subsets.SPE assessment has indeed focussed on: (i) a single or a limited sample of SPEs (e.g.Cao et al., 2018;Erazo et al., 2018;Shrestha et al., 2017); (ii) transition from the previous to a new version of an algorithm for a specified SPE (TMPA-v6 to TMPA-v7 for example) (e.g.Chen et al., 2013;Melo et al., 2015;Milewski et al., 2015); and (iii) the effectiveness of the transition from the first (TRMM) to the second (GPM) generation of SPEs (e.g.Satgé et al., 2017;Sharifi et al., 2016;Wang et al., 2017).All these studies provided useful feedback related to their specific objectives but did not really help assess the respective performance of SPEs due to the small sample of SPEs considered.
For these reasons, comprehensive feedback on SPEs including space-time consistency, different indicators, insights over unmonitored regions, and a representative SPEs sample can only be acquired by backcrossing large SPE assessment studies.Even so, as each study is based on different statistical indices, spatial and temporal scales and periods, such an effort is seriously compromised.

Objectives
This paper proposes a protocol to assess the space-time consistency of a set of 12 SPEs.The efficiency of the SPEs is evaluated at various spatiotemporal scales and indicators in comparison with: (i) gauge point observations; (ii) streamflow observations via sensitivity analysis of a hydrological model in different catchments; and (iii) snow cover observed from satellite imagery via sensitivity analysis of a snow model in an unmonitored mountainous area.The aim of using different indicators is to evaluate whether the efficiency of the SPEs varies with the assessment method, and also to provide feedback on the use of SPEs over poorly monitored regions.The Lake Titicaca region was selected as study area because it includes all the specific features referred as potential limiting factors for SPEs (high mountainous massifs, large water bodies and snow covered areas) to evaluate the potential of SPEs in an extreme context in terms of the limits of sensors with respect to the orographic effect (i.e.mountains) and high temperature/ emissivity contrast (i.e.Lake Titicaca and a snow-covered region).

Study area
The Lake Titicaca basin is located between 14°S and 17°S and 71°W and 68°W in the northern part of the South American Andean plateau, known as the Altiplano.It extends over an area of 49,000 km 2 .The Lake Titicaca catchment is bordered to the west and east by the two Cordilleras (Occidental and Real) and includes a few snow-covered areas.With a surface area of 8,560 km 2 , a mean depth of 105 m (284 m max) and a water volume estimated at 903 km 3 (Delclaux et al., 2007) Lake Titicaca is the main water body and source of the endorheic Altiplano hydrologic system.The lake is drained by the Desaguadero River to the south (Fig. 1).
Runoff from the lake basin and direct precipitation over the lake contribute approximately 53 and 47% of the lake water supply, respectively (Roche et al., 1992).However, studies on the regional water balance are more than 30 years old (Carmouze et al., 1977;Carmouze and Aquize, 1981;Lozada, 1985) and may not be representative of the current situation.
Indeed, a recent study over the Altiplano (López-Moreno et al., 2015) found a temperature increase of between 0.15 and 0.25°C decade -1 between 1965 and 2012, while Heidinger et al. (2018) reported an increase in the intensity of precipitation extremes over the period 1965-2010 in the same region.The temperature is expected to continue to increase until the end of this century while precipitation may decrease by 10 to 30% depending on the climate projections (Bradley, 2006;Minvielle and Garreaud, 2011;Urrutia and Vuille, 2009).There is thus a need to update the Lake Titicaca water balance to support efficient water management to adapt to this changing situation.However, the transboundary, economic and remote context means the sparse hydro-meteorological monitoring network prevents a consistent analysis.Indeed, precipitation presents high spatiotemporal variability due to the geomorphologic and tropical context which is poorly represented by the available monitoring network.With nearly-global scale coverage, SPEs are a promising alternative to monitor regional precipitation in space and over time.

Hydro-meteorological stations
Precipitation and air temperature data for Bolivia were provided directly by the Servicio Nacional de Hidrologia e Meteorologia (SENAMHI) whereas for Peru, data were collected from the Peruvian SENAMHI website.Only weather gauges with less than 20% daily missing data over the 2000-2012 period were selected, giving a total of 72 stations in Bolivia and 51 in Peru (Fig. 1).
Water level and discharge data from SENAMHI were managed with the free software HYDRACCESS (Vauchel, 2005) developed by SO-HYBAM to obtain daily discharge records at the outlet of four catchments (Fig. 1): the Ilave (7,766 km²), the Katari (2,588 km²), the Keka (801 km²), and the Ramis (14,560 km²) catchments with respective mean annual discharge estimated at 37.3, 2.0, 3.8 and 73.0 m 3 .s - (Uria & Molina, 2013).Nearly complete discharge observations over the 2000-2012 period were available for two basins (Katari and Keka), whereas discharge observations were only available from 2008 to 2012 for the other two (Ilave and Ramis).

Interpolation of meteorological in-situ observations
To obtain continuous and spatialized meteorological series for the study area, precipitation and temperature gauge data were interpolated using the inverse distance weighted method (IDW) on a 5-km grid at the regional scale (for details on the purpose of hydrological modelling, see section 3.2) and on a 500-m grid at the local scale (for the purpose of snow modelling, see section 3.3).The choice of the IDW technique as interpolation method was based on the study of Ruelland et al. (2008), which showed low sensitivity of hydrological models to rainfall input datasets derived using different interpolation methods (IDW, Thiessen, spline, ordinary kriging) with IDW yielding the highest hydrological efficiency.
Temperature values were interpolated by accounting for a constant lapse rate of 6.5°C km -1 , in a similar way to that described in Ruelland et al. (2014).No orographic effect was accounted for in the interpolation of the point precipitation observations in order to allow for objective comparisons with the satellite products as these estimate precipitation even in the remote, mountainous areas.P ref and T ref, refer to interpolated precipitation and temperature, respectively.

Satellite precipitation estimates (SPEs)
Twelve SPEs with a spatial resolution below or equal to 0.25° (~25 km at the Equator) were selected for the 2000-2012 period.SPEs with coarser resolution were not used to ensure accurate representation of spatial precipitation variability.The SPEs include the following datasets: Climate Hazards Group InfraRed Precipitation (CHIRP), Climate Prediction Center MORPHing (CMORPH), Global Satellite Mapping of Precipitation (GSMaP), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network (PERSIANN), Soil Moisture to Rain (SM2Rain) method, Tropical Rainfall Measuring Mission (TRMM), Multisatellite Precipitation Analysis (TMPA) and Multi-Source Weighted-Ensemble Precipitation (MSWEP).All the SPEs used a combination of satellite (S) data gathering information from passive microwave (PMW) radiometers and infra-red (IR) data from Low Earth Orbital (LEO) and geosynchronous satellites, respectively except for the SM2RAIN method, which relies on satellite surface soil moisture derived from passive and active microwave, for precipitation estimates.The selected SPEs differ in terms of the combination of satellite sensors, algorithms and whether the products include reanalysis (R) and/or a calibration step against gauge (G) data in their processing or not.Table 1 provides an overview of these SPEs and relevant references for more information on their respective production.The mean annual precipitation pattern retrieved from all SPEs is presented in Fig. 2.
SPEs were first aggregated to obtain daily time step records using 8h to 8h (local time) time windows to match local daily gauge observations.It should be noted that some SPEs (CHIRP, CHIRPS and GSMaP) are only delivered at daily scale.For those SPEs, the daily aggregation is based on different time windows which could compromise the comparison of SPEs at daily scale.This is why we aimed to analyse SPEs at a 10-day time step for which we assumed that differences in daily temporal windows were negligible.Finally, using the nearest neighbour technique, all the SPEs were spatially resampled to 5 km to facilitate their comparison.The resulting database consists of 10,500 daily virtual stations at 5 km spatial resolution over 12 hydrological years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) for each SPE.Additionally, SPEs were resampled to 500 m resolution over the selected subset region to assess SPE potential for snow modelling (see Fig. 1).
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.Distribution (SCD) represents the percentage of days with snow and can be retrieved for any pixels and periods.

A protocol to evaluate the space-time consistency of satellite precipitation estimates
Figure 3 is a flowchart of the main methodological steps.Twelve SPEs were first considered as a representative sample of currently available SPEs.This is an important consideration to guide potential SPE users towards the most efficient SPE.
However, to avoid overloading the research with unnecessary results, a pre-selection was made at the Titicaca Lake catchment scale (hereafter denoted regional scale) to discard useless SPEs.The remaining SPEs were then assessed using three successive and complementary methods.The first assessment step consisted of comparing SPEs and gauge observations at the locations of the 69 pixels which included gauges.The second assessment step consisted of analysing the sensitivity of streamflow modelling to the SPEs at the four basin outlets using observed streamflow as reference data.The third step consisted of analysing the sensitivity of snow modelling to the SPEs (precipitation datasets) over a subset mountainous area in the Andes (see Fig. 1) using snow cover duration as observed from MODIS gap-filled snow products as reference.The aim of the proposed protocol was to assess the space-time consistency of the SPEs.The spatial consistency was accounted for by the gauge and basin distribution, as well as by the use of a fully distributed snow model for the first, second and third assessment step, respectively.To evaluate the consistency of the SPEs over time, each assessment step was analysed according to three 4-year time windows.It is worth mentioning that simulated streamflow and snow cover using

Comparison of SPEs with gauge observations: pre-selection and evaluation
To avoid overloading the analysis with unsuitable SPEs, SPE consistency was first analysed at the regional scale for the 2000-2012 period.For each of the 69 0.05° pixels including at least one precipitation gauge, the daily precipitation series from the gauges and SPEs were first aggregated to the 10-day time step over the 12-year period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).The other pixels were not used as P ref is influenced by the interpolation process.For each of the 69 0.05° pixels, only the daily values available from all precipitation datasets (P ref and SPEs) were used and the 10-day records were only computed when more than 80% of daily values records were available.Next, mean regional 10-day precipitation series were computed from P ref and all SPEs by aggregating the values from all 69 pixels.For each pixel, only 10-day values available for all SPEs and P ref were used.It should be noted that SM2Rain estimates rely on soil moisture observations with many missing data over water bodies and mountainous regions (Dorigo et al., 2015), leading to significant spatial gaps over Lake Titicaca and the Cordillera region (Fig. 3).As a result, in comparison to other SPEs, only 44 P ref 0.05° pixels (including gauges) were (2 where STD is the standard deviation in mm, n is the number of values; and P the precipitation value in mm (SPE or P ref ).
where %B is the SPE bias value as a percentage, n is the number of values, SPE the precipitation estimate of the considered SPE value in mm; and P ref the reference precipitation value in mm.
where CRMSE is the centred root mean square error in mm, n is the number of values, SPE the precipitation estimate of the considered SPE value in mm; and P ref the reference precipitation value in mm.
To facilitate interpretation of the statistical results, the Taylor diagram (Taylor, 2001)  To assess the consistency of SPEs over time, mean regional precipitation 10-day series were compared to P ref according to a Taylor Diagram for three 4-year periods corresponding to the 2000-2004, 2004-2008 and 2008-2012  respectively.For all the periods considered, 98.6% of P ref pixels had more than 90% 10-day records for both the wet and dry season.Consequently, the temporal assessment of the SPEs was not expected to be influenced by any inconsistency of P ref over time in terms of available records.
To assess the spatial consistency of the SPEs, we compared the CC, CRMSE and %B computed between the SPEs and P ref 10-day series for the 2000-2012 period at the location of each pixel which included gauges.For each SPE, CC, CRMSE and %B obtained at the pixel level were plotted to highlight regions potentially concerned by low (high) SPE potential.We repeated the analysis for the three 4-year windows.We only considered CRMSE scores to simplify interpretation of the results, as this score was found to provide more statistical discrimination than CC, STD and %B.It is worth mentioning that resampling SPEs (see section 2.3.1)could affect the assessment of SPE potential at the pixel level.Indeed, at a coarser spatial resolution (0.25°) areal precipitation estimates of SPE pixels are expected to be more representative of the mean precipitation derived from all the gauges in the pixel considered than the one derived from a single gauge.However, for these particular pixels, a preliminary SPE assessment at the original and resample grid size revealed no significant differences (data not shown).
For each SPE, CRMSE values obtained at the pixel level and for each period were plotted to highlight regions possibly concerned by low (high) SPE potential.
Finally, for each pixel, mean and STD CRMSE values were computed from the three CRMSE values obtained from the three 4-year periods.Mean and STD CRMSE values were then plotted to assess the consistency of SPEs in space and over time.The lower the mean and STD of CRMSE, the more stable the SPE considered at the specific pixel location.

SPEs as input data for hydrological modelling
The GR4j lumped hydrological model (Perrin et al., 2003) was chosen to analyse the sensitivity of streamflow simulations to the various SPEs.The model has demonstrated its ability to perform well under various hydro-climatic conditions (e.g., Coron et al., 2012;Perrin et al., 2003;Grouillet et al., 2016;Dakhlaoui et al., 2017), notably in the Andean region (e.g., Hublart et al., 2016).
This model relies on daily precipitation (P) and potential evapotranspiration (PE) ,which was computed using the Firstly, a production module computes the amount of water available for runoff, i.e., "effective precipitation".To do so, a soil-moisture accounting (SMA) store is used to separate the incoming precipitation into storage, evapotranspiration and excess precipitation.At each time step, soil drainage is computed as a fraction of the storage and added to excess precipitation to form the effective precipitation.Secondly, a routing function split the effective precipitation into two components: 90% is routed as delayed runoff through an unit hydrograph UH1 in series with a non-linear routing storage while the remaining 10% is routed as direct runoff through an unit hydrograph UH2 (Perrin et al., 2003).UH1 and UH2 consist in a slow and quick routing path, respectively, to account for differences in runoff delays.Finally, the streamflow at the catchment outlet is computed by summing up delayed and direct runoff.This model relies on four calibrated parameters: maximum capacity of the soil moisture accounting store (X1, mm), inter-catchment exchange coefficient (X2, mm), maximum capacity of routing storage (X3, mm), and time base for unit hydrographs (X4, days) for each catchment.
Acceptable parameter bounds were defined according to the recommendation of Perrin et al., (2003) and previous experiments in a similar Andean context (Hublart et al., 2016;Ruelland et al., 2014).The following ranges were used for model calibration with all the precipitation datasets tested (P ref and SPEs): 10 mm < X1 <1,800 mm, -5 mm < X2 < 5 mm, 1 mm < X3 < 500 mm, 0.5 days < X4 <5 days.These ranges are large enough to compensate for the differences in the different precipitation datasets and basins.Beyond these ranges, we assumed that streamflow simulations could not be realistic.In practice, parameter bounds were rarely reached when calibrating the model with the different datasets tested (data not show here for the sake of brevity).
The area catchment P and PE averaged values were computed from the 5-km pixels P (P ref and SPEs) and PE included in each catchment considered.We used a weighted average based on the 5 km*5 km fraction included in the catchment considered.P ref and SPEs were used sequentially as forcing precipitation datasets for the streamflow simulation.For each run, model parameter calibration was based on the shuffled complex evolution (SCE) algorithm (Duan et al., 1992) by optimising the Nash-Sutcliffe efficiency criterion (NSE, Eq. 6; Nash and Sutcliffe, 1970) at a 10-day time step.The NSE criterion represents the overall agreement of the shape of the hydrograph, while placing more emphasis on high flows.NSE values vary from -∞ to 1 with a maximum score of 1 meaning a perfect agreement between the observed and simulated values.At the contrary, negative values mean that more realistic estimates are obtained using the observed mean values rather than the simulated ones.
where NSE Nash-Sutcliffe efficiency,    and    are, respectively, the observed and simulated streamflow for time step t; and N is the number of time steps for which observations are available.
The distribution of the basin in the study region (Fig. 1) provided the opportunity to assess the hydrological consistency of the SPEs in space by running GR4j in the four basins over the common 2008-2012 period of observed discharge  2012) and over three 4-year sub-periods (2000-2004, 2004-2008, and 2008-2012).No validation step was used as the objective was to assess hydrological modelling sensitivity to various precipitation datasets (P ref and SPEs) and not to assess the hydrological model robustness under climate variability.The aim of the analysis of the streamflow simulation accuracy among the basins and periods considered was thus to evaluate the strength of the SPEs in space and over time and discrepancies in reproducing streamflow.

SPEs as input data for snow modelling in the Andes
A distributed, degree-day model (Ruelland et al., submitted) was chosen to analyse the sensitivity of snow cover simulations to the SPEs.This storage-based model relies on daily, distributed precipitation (P), temperature (T) and potential evaporation (PE) (Eq.7) to represent the main snow accumulation and ablation (sublimation and melt) processes (see Fig. 4).It operates at a daily time step according to a grid of 500 x 500 m corresponding to the spatial resolution of the MODIS data.
Snow accumulation is defined using a temperature threshold Ts fixed at 0°C.Sublimation is accounted for based on daily PE sub (mm) at the target pixel and snowmelt is controlled with a melt factor parameter Kf (°C -1 d -1 to be calibrated) according to Eq. (8-9).
=  ×   (7) where PE sub is potential evapo-sublimation, PE is potential evaporation (see Eq. 5); and K sub is a proportional coefficient depending on the mean latitude (lat, decimal degrees) of the study area and varying from 0 to the poles to 1 at the equator (see Ruelland et al., submitted for more details) where Mf is the potential melt (mm), Kf is a melt factor parameter to be calibrated, T is the temperature on day j on the pixel considered, Ts is the threshold temperature parameter (fixed at 0°C) for snow accumulation and melt.Melt cannot exceed the snow water equivalent (SWE) of the snowpack storage.
The snow-covered areas (SCA) are estimated from the SWE.For each pixel, snow is stored in a reservoir which represents the SWE of the pixel snowpack (see Fig. 4).It is fed solely by the solid fraction of precipitation and is emptied according to the simulated sublimation and melt processes.For each model, a pixel is assigned to snow or not depending on a water level threshold SWE_th (mm), to be calibrated.As the region contains permanent snow covered areas, for pixels located below and above 5,700 m asl, the SWE reservoir was initialized to 0 mm and 300 mm, respectively, at the beginning of the simulations.These values were defined based on MODIS snow observations and on model sensitivity tests to SWE initial conditions accounting for different elevation thresholds.The analysis revealed limited sensitivity to initial conditions (data not shown).An initial 3-year warmup period was used for each simulation to limit the influence of these conditions.
The following ranges were used for model calibration with all the tested precipitation datasets (P ref and SPEs): 0.5 mm.°C -1 d -1 < Kf < 20 mm.°C -1 d -1 , 1 mm < SWE_th < 80 mm.Kf ranges were based on ranges adapted from the values reviewed in Hock (2003).Regarding the SWE_th value, the assumption is that, using remote sensing, snow cover cannot be detected below a certain threshold (Bergeron et al., 2014).For instance, based on in-situ measurements to detect snow cover from MODIS in the Pyrenees, Gascoin et al., (2015) found a mean threshold of 40 mm.Since this value may be influenced by the local context (vegetation, topography, and climate) and spatial difference between point (in-situ) and areal satellite (MODIS) observations, SWE_th was tested according to large ranges.It is worth mentioning that the tested bounds were reached during calibration for all simulations (i.e.Kf=20 mm°C -1 d -1 and SWE_th=1 mm).However we did not consider larger parameter ranges to ensure "realistic" simulations.
In association with T ref for temperature forcing data (see section 2.2.), P ref and SPEs were sequentially used as forcing precipitation datasets to simulate snow cover with the model.For each run, model parameters were calibrated based on the shuffled complex evolution (SCE) algorithm by optimising the pixel-to-pixel correlation between the snow cover duration (SCD) simulated by the model and that observed by the gap-filled MODIS snow products (see section 2.3.2.), according to the following Eq.( 10): where R² is a determination coefficient, n is the total number of pixels in the study area (see Fig. 1); p is a given pixel; SCD MODIS and SCD MODEL are, respectively, the snow cover duration (SCD) observed by MODIS and the SCD simulated by the model as a percentage of days over the analysis period.
For each precipitation input (P ref and SPE), the model was calibrated against MODIS observed snow cover over the entire period 2000-2012 and over three 4-year sub-periods (2000-2004, 2004-2008, and 2008-2012).The aim of the analysis of the snow simulation accuracy among the areas and periods considered was to evaluate the strength of SPEs in space and over time and to identify discrepancies in reproducing snow cover in a remote Andean area (see Fig. 1).

Space-time consistency of SPEs compared with gauge observations
Figure 5a is the Taylor diagram obtained from SPEs at the regional scale for the 2000-2012 period.SPEs are distributed over the Taylor diagram, indicating high discrepancy among them.Some SPEs greatly overestimated precipitation with %B values of 165%, 75%, 40% and 45% for CMORPH-CRT, TMPA-RT, PERSIANN and PERSIANN-CDR, respectively.CMORPH-RAW, GSMaP and SM2Rain, greatly underestimated precipitation with %B values of -24%, -25% and -37%, respectively.However, all SPEs were highly correlated with P ref with CC greater than 0.75.Generally, including gauge data in the SPE processing clearly enhanced precipitation estimates.Indeed, CHIRPS, CMORPH-BLD and PERSIANN-CDR were closer to the reference dot than their respective non-adjusted version, CHIRP, CMORPH-CRT and PERSIANN.With the closest and farthest distance to the reference, MSWEP and CMORPH-CRT were respectively the most and least consistent SPEs to represent the mean regional precipitation over the 2000-2012 period.
According to the literature, a quality threshold value can be used to express SPE potential.Some authors (e.g., Hussain et al., 2017;Satgé et al., 2016;Shrestha et al., 2017), considered that a normalized RMSE value lower than 0.5 is associated with a very good SPE performance.Even there were slight differences between CRMSE and RMSE, the use of a normalized CRMSE threshold value of 0.5 to select only the most efficient SPEs remains logical.Therefore, six SPEs were selected for the following assessment steps including CHIRP, CHIRPS, CMORPH-BLD, PERSIANN-CDR, TMPA-Adj, and MSWEP.Figure 5b presents the performance of the SPEs at the regional scale for the three 4-year time windows considered: 2000-2004, 2004-2008 and 2008-2012, in the form of a Taylor diagram.At the regional scale, SPE rank performance remained stable during the time windows considered and similar to what was observed for the 2000-2012 period (Fig. 5).
Therefore, at the regional scale, SPEs were generally consistent over time, MSWEP being the most accurate and PERSIANN-CDR the least accurate SPE.However, for the 2008-2012 period, CHIRPS and CHIRP were closer than for the previously considered period.This might be due to the decrease in the number of available gauges for the adjustment processes applied on CHIRP to produce CHIRPS.
Figure 5c shows the spatial distribution of SPE errors for the 2000-2012 period in terms of %B, CC, and CRMSE.
CMORPH-BLD was poorly correlated with P ref with the highest proportion of pixels with CC less than 0.7.This value is generally used as a quality threshold with CC values less than 0.7 indicating poor SPE performance (see e.g., Satgé et al., 2016).MSWEP had the best CC value overall with the highest proportion of pixels with a CC greater than 0.9 and only two pixels with unsatisfactory CC values.
The main differences were in %B and CRMSE.CMORPH-BLD and PERSIANN-CDR precipitation under-and overestimations at the regional scale (Fig. 5a) were confirmed at the gauge scale (Fig. 5c).CHIRP, CHIRPS, and TMPA-Adj presented similar %B distribution while MSWEP had the most homogeneous %B distribution with the values of almost all pixels ranging between -30% and +30%.This range was previously defined as satisfactory %B for SPEs (Shrestha et al., 2017).The gauge adjustment applied on CHIRP was globally positive with a %B reduction from CHIRP (15.9%) to CHIRPS (0.4%) of almost 100% (Fig. 4a).It considerably increased the numbers of pixels with %B between -15% and +15% from CHIRP to CHIRPS (Fig. 5b) and generally enhanced CRMSE and CC scores.
Interestingly, all the SPEs underestimated precipitation for the two pixels located over the northern Lake Titicaca islands.This is probably linked to SPE's limited ability to detect warm cloud precipitation (see section 5.1).
CMORPH-BLD, PERSIANN-CDR, and TMPA-Adj had the highest proportion of pixels with CRMSE values greater than 0.7 (Fig. 5c).This value can be used as a quality threshold above which SPEs performance is considered as unsatisfactory (see e.g.Shrestha et al., 2017).Therefore, over the 2000-2012 period, precipitation estimates derived from CMORPH-BLD, PERSIANN-CDR and TMPA-Adj are subject to high uncertainties at local scale.The inclusion of gauge observations for CHIRPS estimates reduced the number of pixels with unsatisfactory performance by 50% in comparison with the non-adjusted CHIRP version.With only 14 and 12 pixels with CRMSE greater than 0.7, respectively, CHIRPS and MSWEP showed the highest spatial consistency.
More generally, the spatial analysis highlighted three areas in which all the SPEs considered presented less satisfactory statistical scores (CRMSE, CC, and %B) than over the remaining areas.These regions correspond to the south eastern Lake Titicaca shore, and the south-western and north-eastern borders of the catchment.Figure 6 shows the performance of the SPEs for three 4-year periods (2000-2004, 2004-2008, and 2008-2012)    For all the catchments considered, the best streamflow simulations were obtained with at least one of the SPEs as forcing precipitation data.TMPA-Adj and CMORPH-BLD provided a better streamflow simulation than P ref for Ilave and Keka respectively, and MSWEP outperformed P ref for the Katari and Ramis catchments.These results show that SPEs can efficiently replace the currently available sparse precipitation gauge networks for use in hydrological studies of the region.

Space-time consistency of SPEs compared with streamflow simulations
Overall, MSWEP appears as the most consistent SPE product for streamflow simulations.Indeed, the streamflow simulations forced by MSWEP were more realistic than those forced by P ref over three catchments (Katari, Keka, and Ramis) and were almost the same for the Ilave catchment.
However, as shown in Fig. 7c, the SPE hydrological ranking in the 2008-2012 period changed drastically over time.
For example, for the Katari catchment, MSWEP led to the best streamflow simulations for the 2004-2008 and 2000-2012 but not for the 2000-2004 period, for which TMPA-Adj forced streamflow simulations had a higher NSE score of 0.85.
Additionally, CMORPH-BLD potential fell drastically over the 2000-2004 period with a negative NSE score, whereas it produced the most realistic streamflow simulation for the period 2008-2012.In the Keka catchment, for each time window, the best streamflow simulation was obtained using different SPEs.CHIRP, PERSIANN-CDR and CMORPH-BLD resulted in the highest NSE scores over the various sub-periods analysed, with respectively 0.73 for the period 2000-2004, 0.83 for 2004-2008 and 0.77 for 2008-2012.To conclude, the analysis of streamflow model sensitivity to the different precipitation datasets depended to a great extent on the time window and catchment considered.However, MSWEP appeared to be the most 'stable' SPE.It provided the most realistic streamflow simulations with higher NSE scores than P ref for seven out of ten simulations and also outperformed other SPEs in almost all streamflow simulations tested.The results obtained for the 2000-2012 period faithfully reflected global SPE performance for all the time windows with MSWEP as the most and CMORPH-BLD as the least suitable SPE for hydrological modelling over the Katari and Keka catchments.

Space-time consistency of SPEs for snow cover duration
Figure 8 compares the snow cover duration (SCD) observed by MODIS and the durations simulated by the snow model using P ref or all the SPEs as input data over the different periods.Whatever the period, the best SCD simulations were obtained using P ref as input data with R 2 values of 0.83, 0.80, 0.79 and 0.82 for the 2000-2004, 2004-2008, 2008-2012 and 2000-2012 periods, respectively.Simulated SCD using SPEs as input data systematically overestimated SCD compared to the SCD computed from the gap-filled MODIS snow products.The MSWEP (CHIRPS) datasets provided the most realistic (unrealistic) SCD simulations over 2000-2012 with R² of 0.80 and 0.67 respectively (see Fig. 8a).Interestingly, mean annual precipitation over 2000-2012 (see Fig. 8a) was significantly higher with the SPEs (ranging from 762 mm with MSWEP to 1,229 mm with CHIRP) than with P ref (523 mm).The least realistic SCD simulations with the SPEs may thus be explained by higher precipitation, which increases snowfall input and snow cover duration despite the specific calibration of the snow model.Indeed, MSWEP mean annual precipitation (762 mm) was the closest to the P ref ones and provided the most realistic SCD estimates.Conversely, CHIRP provided the highest mean annual precipitation estimate (1,229 mm) whereas it produced the least realistic SCD estimates with R 2 values of 0.74, 0.58, 0.65, and 0.67, for the 2000-2004, 2004-2008, 2008-2012 and 2000-2012 periods, respectively.This trend was observed for all the periods considered (see Fig. 8b).All the other SPEs presented relatively close mean annual precipitation estimates and therefore relatively close SCD simulation performances (Fig. 8b).
Interestingly, all the SCD simulations based on SPEs as inputs had higher scores for the period 2000-2004 with minimum efficiency for CHIRP (R 2 =0.74) and maximum for MSWEP (R 2 =0.83).This is not in line with the space and time analysis based on gauges and hydrological modelling analyses.Indeed, CMORPH-BLD was the least efficient SPE to represent P ref and observed streamflow over 2000-2004, while its SCD estimates was the second most efficient SCD simulation forced by SPEs.Similarly, the CHIRP forced streamflow simulation was the most realistic one in the Keka basin for the period 2000-2004, while the CHIRP SCD simulation was the least efficient in comparison to the other SPEs.The TMPA forced streamflow simulation was the first most realistic one for the Ilave basin and the second most realistic one for the Ramis basin over the period 2008-2012, whereas its SCD simulations were among the least efficient (R 2 =0.67) in comparison to the other SPEs.This shows that a SPEs can be effective for a specific region or for a given indicator (i.e.gauges and hydrological modelling) but not for another region or indicator (i.e.snow modelling) and inversely.However, the analysis also confirmed the MSWEP overall stability and efficiency in space and over time previously highlighted for the precipitation and streamflow representation.Indeed, MSWEP forced SCD simulations remained relatively stable with similar scores obtained for all the periods considered.

Discussion and conclusion
5.1.SPE spatial variability: the Lake Titicaca bias At the regional scale, the performances of the SPEs were in relatively good agreement with the gauge references.However, their performances differed markedly in space with a general trend to underestimating precipitation over Lake Titicaca.
Due to its size, the solar radiation absorption capacity of Lake Titicaca increases the temperature by 4° to 6° over the superficial water layer in comparison to that over the surrounding land (Delclaux et al., 2007;Roche et al., 1992).
Additionally, evaporation from Lake Titicaca is very high, estimated a 1,700 mm.year -1 (Pillco Zolá et al., 2018).Therefore, crossing the lake, the air masses pick up lake moisture which increases their temperature and allows their ascension.This convection results in more precipitation over the lake than over the surrounding land (Roche et al., 1992).These precipitation events originate from warm clouds whose detection remains challenging for SPEs.Indeed, SPEs use cloud top IR temperature to discretize rainy and rainless clouds.The IR temperature threshold may be too low to correctly detect warm rain clouds, as suggested over mountainous region (see e.g., Dinku et al., 2010Dinku et al., , 2007;;Gebregiorgis and Hossain, 2013;Hirpa et al., 2010;Li et al., 2013).Consequently, many precipitation events may be lost leading to underestimation of precipitation over Lake Titicaca.To support our hypothesis, we computed the probability of detection (POD) of daily precipitation events.
POD is an indicator of a SPE's ability to correctly forecast precipitation events with values ranging between 0 and 1 and a perfect score of 1.A low POD indicates that precipitation events are not detected by SPEs. Figure 9 shows the POD obtained from all SPEs for each pixel with gauges over the 2000-2012 period.With relatively lower POD values for pixels above the lake than for pixels over the surrounding land, TMPA-Adj and CHIRPS exhibited a clear trend to underestimate daily precipitation occurrence over the lake.MSWEP and CHIRP showed the same trend over the northern part of the lake.These POD "anomalies" could partially explain the negative bias observed over the Lake Titicaca because of the warm cloud precipitation process.CMORPH-BLD and PERSIANN-CDR showed no significant trends, and were the least efficient SPEs overall.It is noteworthy that the eastern lake border is characterized by significant emissivity and temperature changes (induced by the lake and by the snow cover in the Cordillera), which perturb PMW precipitation retrieval (see e.g., Ferraro et al., 1998;Levizzani et al., 2002;Tian and Peters-Lidard, 2007) and may also contribute to the SPEs relatively lower performance.
River streamflow and precipitation over the lake surface account for respectively, 53% and 47% of the total Titicaca water supply (Roche et al., 1992).Even if MSWEP led to realistic simulations of streamflow, its absolute bias greater than 35% over the lake is a major problem when attempting to model the Lake Titicaca water budget.Therefore, MSWEP precipitation estimates need to be adjusted over the lake.Previous studies merging gauges and SPEs successfully applied over the Altiplano (Blacutt et al., 2015;Heidinger et al., 2012;Vila et al., 2009) and elsewhere (Ma et al., 2017;Ringard et al., 2017) should be used as a guideline.

Gauges versus hydrological modelling based assessment
Most of the studies on comparisons of SPE potential rely more on gauge-based assessment (see reviews by Maggioni et al., 2016 andSun et al., 2017) rather than on hydrological modelling based assessment (see review by Maggioni and Massari, 2018).However, as shown in this study, the SPE's performance compared with gauge observations is not systematically supported by the sensitivity analysis of streamflow modelling.To provide more insight into this discrepancy, Fig. 10 compares SPE performance with precipitation gauges and with observed streamflow at the outlet of the four basins considered (Ilave, Katari, Keka, and Ramis).To simplify the analysis, the comparisons focus on the best and worst SPEs To summarize, SPE assessments based on a gauge network make it possible to detect very local SPE inconsistencies but are influenced by (i) the distribution of the gauges and (ii) the difference in spatial resolution (point vs. pixels).It is also worth mentioning that hydrological-based assessments may be influenced by the model itself.In this study, the SPE assessment relied notably on their ability to provide realistic streamflow simulations at the basin outlet using a lumped model.Distributed physical models could reinforce the potential of each product based on spatial criteria (for instance humidity, water level), but this would be difficult over the Lake Titicaca region due to the scarcity of data.A complete quality assessment of SPEs is therefore difficult to achieve and the choice of the most suitable SPEs should rather be based on assessment steps and include the final use of the SPEs.

Space-time SPE consistency
For each assessment indicator considered (i.e.gauges, streamflow and snow cover duration), the performances of the SPEs varied depending on the time window considered.As shown in the analysis, a given SPEs can appear to be the best option to represent precipitation for one particular period but not for another.Additionally, for the same period, the SPE may be suitable for a specific regional subset but not for another.This space-time inconsistency is of major concern in the current context of climate variability.Indeed, over remote regions, the scarcity of meteorological stations encourages scientists to use remote sensing data to understand precipitation variability and its contribution to meteorological, agricultural and  et al., 2017;Arvor et al., 2017;Tan et al., 2017;Tao et al., 2016;Bayissa et al., 2017;Satgé et al., 2017a).Therefore, for these studies, a consistent analysis of the consistency of SPEs in space and over time is a pioneering step to select the most realistic SPEs.It should make it possible to foresee potential propagation of SPE inconsistencies in the studies to consistently weight the observed results.From our analysis, it will be recalled that some SPEs are relatively stable at regional scale (CHIRPS, MSWEP) and could thus be used to study regional precipitation patterns with a relatively high degree of confidence.For studies on the spatial variability of precipitation, MSWEP is the most suitable SPE for the study region as it is the most stable in space and over time.However, as discussed above, correction methods need to be taken into consideration to enhance precipitation estimates over Lake Titicaca.Moreover, care should be taken in the north-eastern and south-western regions corresponding to the transition from the Amazon to the Altiplano and the Altiplano to the Pacific watershed zones, respectively (see Fig. 6d).For these specific regions, all the SPEs studied here represented the lowest space-time stability.These regions present very significant variations in elevation, which are known to interfere in SPEs (Ochoa et al., 2014;Satgé et al., 2017b) and may partially explain this local discrepancy.

Snow modelling to assess SPEs over unmonitored regions
SPEs are subject to high uncertainty in mountainous regions.Indeed the precipitation/no precipitation classification based on cloud top IR temperature generally fails because the temperature threshold used for the discretization process is too high.
Currently, over high elevation mountainous regions, the top cloud temperature is generally lower than over flat regions resulting in many non-rainy cold clouds being misidentified as rainy clouds (Hussain et al., 2017;Satgé et al., 2017a).In addition, snow and ice cover appear to be similar to ice precipitation aloft in the scattering signal in P channel (Ferraro et al., 1998;Levizzani et al., 2002) leading to the misidentification of snow and ice cover with rainy clouds (Dinku et al., 2010;Hussain et al., 2017;Mourre et al., 2016).As an example, the inconstancy described in both IR and PMW precipitation recovery led to marked overestimation by SPEs over the Himalayan Pakistan region (Hussain et al., 2017, Satgé et al., 2018).
However, the accessibility and handling difficulty over high elevation snow covered regions limits the availability of gauges and hence the assessment of SPEs.In this context, the proposed snow modelling-based assessment using snow cover observed by satellite imagery offers an alternative way to obtain initial feedback on the efficiency of SPEs over completely unmonitored regions.The present study shows that gap-filled MODIS snow-products can be used as reference to indirectly assess SPE efficiency and show great promise for the validation of SPEs in regions where distinguishing between rainfall and snowfall is still challenging.However, the distribution of SCD (Fig. 8a) indicates permanent snow-covered areas over which seasonal dynamics from optical imagery (MODIS) are difficult to capture as there is more variation in depth than in spatial extent.Consequently, the regional seasonal snow cycle captured from MODIS is weak and erratic (see Fig. 11).
Preliminary modelling tests showed that the dynamics of these snow covered areas were poorly simulated by the snow model when the P ref and SPE precipitation datasets were used.In addition, the SPEs tested in this study were not originally designed to distinguish between liquid and solid precipitation.The newly released SPEs Integrated Multi-satellite Retrievals for the Global Precipitation Mission (IMERG-v3 v4 v5) are the first SPEs which make it possible to discretize liquid to solid precipitation at 10 km spatial resolution and almost global scale.We expect that such products will enhance the monitoring of seasonal snow dynamics.However, as yet, no studies have reported on the potential of IMERG products to estimate snowfall, which should guide the next step in the assessment of SPEs over the region.

Conclusions
This paper compared the space-time consistency of 12 SPEs at different spatial and temporal scales with point gauge observations and according to sensitivity analysis of snow-runoff responses.The main results of the study can be summarized as follows: -Given the currently available precipitation gauge network, SPEs are attractive and efficient tools to monitor local precipitation and to force impact modelling, such as snow-hydrological models.Of the SPEs with a grid scale of 10 km and more than 35 years of observed precipitation, MSWEP provided the best precipitation estimates at the gauge level, globally the most realistic streamflow simulations (with seven out of ten simulations outperforming the ones obtained using the available precipitation gauge network), and the most realistic simulations of snow cover duration compared to those simulated with the other SPEs.-SPEs present space-time errors that cannot be assessed when the time windows are too short and/or the gauge networks are too scarce.In this context, the space-time consistency of SPEs should be carefully considered before using them for hydro-climatic studies.
-Ranked SPE potential varied depending on the assessment indicators used.It is therefore recommended to assess SPE potential according to indicators which are related to their final use.
-The proposed sensitivity analysis of snow modelling to the SPEs by using MODIS snow-products as control data has great promise for the assessment of SPE potential over completely unmonitored snow-covered regions.
-For the three assessment indicators (gauges, streamflow and snow cover duration) considered here, all SPE versions including gauge data for precipitation estimates (TMPA-Adj, CMORPH-BLD and CHIRPS) outperformed their satellite only based version (TMPA-RT, CMORPH-RAW and CHIRP).
-Soil moisture based precipitation estimates (SM2Rain) were shown to be promising precipitation estimates but are unsuitable for regional contexts with large waterbodies, mountainous and snow covered regions, which include too many gaps in time and space.The much smoothed decreasing north-west south-east precipitation pattern produced by GSMaP was not sensitive to local precipitation variability, reflecting an overall poor performance.

Fig. 1
Fig. 1 Study area: location, meteorological stations (precipitation and temperature observations), streamflow gauges, studied catchments and snow analysis zone.

Figure 2 :
Figure 2: Mean annual precipitation maps for the 2000-2012 period retrieved from all SPEs at their original grid size.For each SPE, only the pixels with more than 80% available daily data were retained.In order to keep the regional precipitation pattern visible, black colour was used to filter pixels whose mean annual precipitation was greater than 2,500 mm.year -12.3.2.MODIS snow products

P
ref as forcing data were also used for comparison with SPE simulations.More details of the proposed protocol are presented in the following sections.Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 3 :
Figure 3: Flowchart of the main steps from SPE preselection to successive assessment approaches including (1) comparison between SPEs and gauge observations, (2) sensitivity analysis of runoff modelling to the various SPEs at the four basin outlets and (3) sensitivity analysis of snow modelling to the various SPEs over a subset mountainous area in the Andes.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.available for SM2Rain.Therefore, SM2Rain was analysed separately from other SPEs by computing additional P ref and SM2Rain mean regional 10-day precipitation series based on the 44 available P ref 0.05° pixels.Mean regional 10-day SPEs and P ref series for the 12-year period 2000-2012 were compared according to different statistical criteria, correlation coefficient (CC), standard deviation (STD), percentage bias (%B) and the centred root mean square error (CRMSE) (Eq.1-4):  = (,  )   ×  (1) where CC is the correlation coefficient, SPE and P ref are the SPE and P ref precipitation time series, and Cov is the covariance.
was used to present obtained CC, and normalized values of STD and CRMSE.Normalization was performed by dividing SPEs CRMSE and STD by P ref STD.Therefore, in the Taylor diagram, the reference (black dot) corresponds to CRMSE, STD and CC values of 0, 1 and 1, respectively.Also in the Taylor diagram, the position of the SPEs relative to the reference dot is an integrated indicator (CRMSE, STD, and CC) of SPE efficiency in reproducing gauge precipitation.The shorter the distance between the SPEs and the reference position, the closer the SPEs and P ref estimates.Additionally, %B values were used to observe the potential over/underestimation of each SPE considered.For the following assessment step, only the six SPEs most efficient at the regional scale for the 2000-2012 were considered.
PE is daily potential evapotranspiration (mm), R e is extra-terrestrial solar radiation (MJ.m −2 .d−1 ), which depends on the latitude of the target point and the Julian day of the year, λ is the net latent heat flux (fixed at 2.45 MJ.kg −1 ), ρ is water density (fixed at 11.6 kg.m −3 ) and T is the daily mean air temperature (°C) estimated at the target point by interpolating the gauge observations while correcting for elevation.Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.availability.The hydrological consistency of the SPEs over time was then evaluated by running the model over the entire 2000-2012 period for which discharge observations were only available in two catchments (Katari and Keka).For each precipitation input (P ref and SPEs), the model was calibrated against observed streamflow over the whole period (2000- Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 5 :
Figure 5: Efficiency of SPEs compared with gauge observations: (a) Pre-selection of SPEs at the regional scale in the form of a Taylor diagram; (b) Consistency of SPEs over time at the regional scale in the form of a Taylor diagram; and (c) Consistency of SPEs in space for the 2000-2012 period from CC, CRMSE and %B values obtained from each P ref pixel including at least one Figure6shows the performance of the SPEs for three 4-year periods(2000-2004, 2004-2008, and 2008-2012)  in terms of CRMSE.The SPEs' lowest potential over the south-eastern shore of Lake Titicaca and at the south-western and northeastern borders of the catchment highlighted in the 2000-2012 period was confirmed for all four time windows.For each

Figure 6 :
Figure 6: Consistency of SPEs in space and over time compared with gauge observations.CRMSE values obtained from the SPEs at the gauge level for three 4-year periods: (a) 2000-2004; (b) 2004-2008; and (c) 2008-2012.(d) Mean and STD CRMSE values obtained for each pixel using the three respective values obtained for the three 4-year periods.

Figure 7
Figure 7 shows the efficiency (in terms of NSE scores) of the hydrological model in reproducing streamflow at the outlet of the four tested basins (Ilave, Katari, Keka, and Ramis) using P ref and SPEs as input data over the period 2008-2012.Streamflow simulations using P ref varied along the catchments with lowest NSE score of 0.63 for Keka and the highest of 0.89 for Katari.Simulated SPE streamflow efficiency followed the same trend except with CMORPH-BLD and TMPA-Adj.TMPA-Adj provided the best streamflow simulation for the Ilave catchment and the worst for Keka, whereas the opposite was observed using CMORPH-BLD as forcing data.The low efficiency of CMORPH-BLD in the Ilave catchment was related to its erroneous streamflow peaks in the 2009 and 2012 dry seasons.CHIRP and CMORPH-BLD had the lowest scores for the Katari and Ramis catchments with NSE values of 0.59 and 0.45, respectively.For all the catchments, streamflow simulations based on CHIRPS presented systematically higher NSE scores than simulations based on CHIRP,showing that the adjustment provided in CHIRPS with the integration of gauge observations led to better precipitation Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-316Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 8 :
Figure 8: Observed (gap-filled MODIS snow-products) versus simulated snow cover duration (SCD) using P ref and SPEs as input data in the snow model: (a) Maps of snow cover duration for 2000-2012; (b) Efficiency (R² scores) of simulated SCD versus MODIS SCD for the different calibration periods.SCD are expressed as the percentage of days over the simulation period.R² values stand for the pixel-to-pixel correlation between the observed and simulated SCD. Green and red stars highlight SPEs with closest and further precipitation estimates to P ref over the subset considered.

Figure 9 :
Figure 9: SPE's ability to forecast daily precipitation events represented in the form of POD.POD=a/a+b with "a" being the number of days on which both SPE and P ref detected precipitation and "b" the number of days on which only P ref detected precipitation events.The legend shows the range between min and max to focus on POD anomalies over Lake Titicaca.Pixels located outside the Lake Titicaca basin were removed to facilitate interpretation.

Figure 10 .
Figure 10.Comparison of the performance of the SPEs with that of precipitation gauges and observed streamflow at the outlet of the four basins (Ilave, Katari, Keka, and Ramis): (a) SPE performance at the gauge level for the four catchments represented according to a Taylor diagram (scores were computed using only the pixels included in the catchments for the 2008-2012 period); (b) Best and worst SPEs according to the gauge-based and hydrological-based assessments for the 2008-2012 period.

Figure 11 :
Figure 11: Snow covered area observed from gap-filled MODIS snow products at a 10-day time step over the Andean subset.

Table 1 : Main characteristics and references of the 12 SPEs considered. In the data source column, S stands for satellite, R for reanalysis, and G for gauge information.
5194/hess-2018-316 Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 13 September 2018 c Author(s) 2018.CC BY 4.0 License.