Rainfall Estimates on a Gridded Network ( REGEN )-A global land-based gridded dataset of daily precipitation from 1950 – 2013

We present a new global land-based daily precipitation dataset from 1950 using an interpolated network of in situ data called Rainfall Estimates on a GriddEd Network REGEN. We merged multiple archives of in situ data including two of the largest archives, the Global Historical Climatology Network Daily (GHCN-Daily) hosted by National Centres of Environmental Information (NCEI), USA and one hosted by the Global Precipitation Climatology Centre (GPCC) operated by Deutscher Wetterdienst (DWD). This resulted in an unprecedented station density compared to existing datasets. The station 5 timeseries were quality controlled using strict criteria and flagged values were removed. Remaining values were interpolated to create area average estimates of daily precipitation for global land areas on a 1◦×1◦ latitude-longitude resolution. Besides the daily precipitation amounts, fields of standard deviation, Kriging error and number of stations are also provided. We also provide a quality mask based on these uncertainty measures. For those interested in a dataset with lower station network variability we also provide a related dataset based on a network of long-term stations which interpolates stations with a record length of 10 at least 40 years. The REGEN datasets are expected to contribute to the advancement of hydrological science and practice by facilitating studies aiming to understand changes and variability in several aspects of daily precipitation distributions, extremes, and measures of hydrological intensity. Here we document the development of the dataset and guidelines for best practices for users with regards to the two datasets. Copyright statement. 15

The numbers in black, blue and green in (b) refer to the average number of stations from GPCC, GHCN and Other sources respectively. records from stations in GHCN-Daily or Other that were unique with respect to the stations in the GPCC archive. Due to the large overlap between the archives, the number of stations from GHCN-Daily is higher when fewer stations from GPCC are available. There is a gradual increase in stations from GPCC until 1990 and a steep decline after 2010. All quality controlled station data hosted by GPCC are eventually archived in a relational database (henceforth referred to as GPCC data base), however, there were additional ASCII data files for various countries that were not processed at the time of the analysis 5 (henceforth referred to as GPCC ASCII data files). Figure 2 shows that most of the station data in Central America, western South America, Europe, Africa, Middle East and East Asia was sourced from GPCC.
We summarise the spatial and temporal distribution of the station network comprising REGEN in figure 3. Each map in figure 3 refers to a decade and shows for each grid the percentage of days in each decade with at least one station, based on 10 REGEN (figure 3a), REGEN40YR (figure 3b) and also GPCC's Full Data Daily V1 (GPCC-FDD1; (Schamm et al., 2015)) for comparison (figure 3c). We compare REGEN's station network with GPCC-FDD1's because until REGEN, GPCC-FDD1 was the global dataset of daily precipitation with the highest station density. It can be seen that not only is REGEN's station network density higher than GPCC-FDD1 in all the decades, but even the REGEN40YR station network with a much stricter completeness criteriion has more stations in all three comparable decades relative to GPCC-FDD1. 15

Quality Control
The quality control procedures used in REGEN were adopted from NCEI, part of National Oceanic and Atmospheric Administration (NOAA) in USA (Durre et al., 2010). The quality control is done in two stages and climatologies generated in an 5 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License. Figure 2. Distribution of stations color coded by source. "GPCC" refers to stations hosted by Deutsche Wetterdienst, "GHCN" refers to stations hosted by National Centers for Environmental Information (NCEI), "Merged" refers to stations that have been identified as identical in two or more archives resulting in a merger of the timeseries and finally "Other" refers to the Russian and Argentinian stations that were added by us. auxiliary step are used in both stages. At the end of the quality control process all data are written in a common format identical to the GHCN-Daily format (see README file, ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt).
The first quality control stage involves basic integrity checks such as checks for erroneous zeros, conflicts between multi-day accumulations and daily reports, duplication of entire years or months, repetition or frequent occurrence of values, and world record exceedances. In addition this test stage also checks for outliers by checking for gaps in tails of distributions and checks 5 for climatological outliers. The test also performs some temporal consistency checks by comparing values with consecutive days to look for unrealistic spikes in precipitation. The second quality control stage does spatial corroboration checks which determines if the value at each station is consistent with the values at neighbouring stations. For further information and detail on the quality control algorithms, refer to Durre et al. (2010). Data failing any tests at any point of the quality control process are flagged (see GHCN-Daily README file (ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt) for a list of quality flags 10 and their meanings). In order to ensure a high quality final dataset, all flagged data are removed prior to interpolation. Although the QC procedures were designed to minimize the number of instances in which true extremes are flagged as errors (Durre 6 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License. et al., 2010), it is possible that a few such extremes are among the flagged values that were withheld from the REGEN input data. Future versions of REGEN may consider methods for recognising and saving possible flagged extremes.
All data sources (each country in the GPCC ASCII data, the GPCC data base and "Other" data) were quality controlled individually before merging. Since our QC prodcedures are idential to the GHCN-Daily, we used the flags already included with the GHCN-Daily data. The percentage of flagged records per year in the final merged input data average around 0.05-5 0.06% throughout the time period spiking to 0.1% around 2010 (figure 4a). This may be because the number of stations in the final merged station network sourced from GHCN-Daily increase in time in the last decade of the temporal record while the number of stations sourced from GPCC decrease. Since GPCC data are assumed to be of higher quality compared to GHCN-Daily due to the manual quality control it is subjected to, the flag rate increases with time as well due to the higher percentage of GHCN-Daily stations in the last decade of the final merged station network. In general we also see a trend of increasing 10 missing months with time in all regions (figure 4b). A month is marked as missing if it contains fewer than 70% of the possible number of daily data records. As a result the percentage of missing months is also an indicator of the completeness of the daily data records. The spike in missing month percentage in South Asia is due to the drop in station data from India in the 1970s.

Merger of GHCN-Daily, GPCC and other smaller data archives
Once the station data from various sources were quality controlled individually they were merged with each other in multiple 15 steps. First, the manually and automatically quality controlled data in GPCC's data base were merged with ASCII data files for various countries that at the time of the analysis were not integrated into the GPCC data base, to create a combined archive of quality controlled GPCC stations. This GPCC archive was then merged with GHCN-Daily archive and subsequently the Argentinian and Russian data respectively.
For consistent comparison GPCC shifts data for certain countries so the daily amount always represents the day closest to 20 7am the day of the timestamp to 7am the next day, local time. For example, if the source in situ data timestamp represents the day from 9am the previous day to 9am the day of the source timestamp, then the resulting GPCC timestamps are shifted a day back compared to the source timestamps. This results in climatologically consistent timestamps. In our case while merging the GHCN data, we shifted the GHCN data timestamps identically to the way GPCC shifted their timestamps, for all countries whose timestamps were shifted by GPCC. The countries for which the data are shifted a day back (e.g. data from 2 nd Jan 25 are saved as 1 st Jan) are listed in the Appendix. So far, the data from no country are shifted forward. This data shifting is important to keep in mind when comparing REGEN with regional datasets. For example when comparing REGEN with the precipitation from the Australian Water Availability Project regional dataset (AWAP; (Jones et al., 2009)) we shifted AWAP a day backward. This may also result in inconsistent comparisons between REGEN and satellite datasets which represent UTC0 the day of the timestamp to UTC0 the next day, and also inconsistent comparisons across political borders where the 30 timezone changes. Figure 7b highlights this timestamp shifting by plotting the unshifted precipitation amount from AWAP averaged across Australia during cyclone Yasi as a dashed line, and the shifted AWAP and REGEN estimates as solid lines.
Note that around 10% of observations in the US are midnight observations, i.e. observations over the 24h period from midnight 7 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License.
to midnight which are assigned to the day on which the observing period ends. Although these observations have not been manually adjusted in this version of REGEN, they will be taken care of in the next iteration.
The merging algorithm used is described below. Two stations were considered identical if: 1. The latitude and longitudes matched to three decimal places, and their elevation (to the nearest integer, if non-missing) and World Meteorological Organisation (WMO) station IDs either match or are missing. Alternatively the stations were 5 also considered a match if the WMO IDs were non-missing and matched and the latitude and longitude matched to one decimal place.
2. If the coordinates were within 1 degree of each other and WMO IDs either matched or were missing and the correlation between the timeseries that overlap was greater than 0.99 and the overlapping timeseries themselves had at least 365 daily data records with a minimum of 10 days with precipitation greater than 1mm.

10
Note that the above algorithm can result in false matches as nearby stations can be highly correlated, however this will mainly be an issue in higly dense networks such as US. For the future version, a more quantitative measure of similarity between station time series will be used. On occasions where precipitation amount from a station was different between multiple sources, we prioritised data from higher quality sources and accepted values from these sources. The data qualities and hence priorities in descending order (highest quality first) are GPCC data base, GPCC ASCII data files, Other data, GHCN-Daily data. 15

Interpolation Method
Station data were interpolated using ordinary block Kriging, exactly as the method used by GPCC's Full Data Daily V1 (GPCC-FDD1; (Schamm et al., 2015)) product. Ordinary block Kriging is a stochastic interpolation method which means it accounts for the statistical structure of precipitation in terms of the spatial autocorrelation function. The autocorrelation function models the statistical relationship between the euclidean distances between the observations and their correlation. The 20 interpolation method calculates a weighted average of the nearest station values based on their distance to the grid point and the autocorrelation function. This interpolation method was chosen by Schamm et al. (2014) after a comparison with various different methods. It produces area average precipitation estimates implicitly by estimating the interpolated field at various points inside the grid box and then calculating their weighted sum. This results in estimates directly comparable to other forms of data that produce area average estimates such as satellite products or climate models. More details of the interpolation 25 method, including the autocorrelation function and its parameters, equations to calculate kriging estimates and their numerical implementation are described in Schamm et al. (2014) and Rubel (1996).
We interpolated ratios of the daily precipitation to the total monthly precipitation. If both the daily records and monthly totals were zero, the ratio was set to zero as well to ensure consistency with monthly datasets. The absolute values were retrieved post interpolation by superimposing the interpolated ratios on the GPCC Full Data Monthly V7 product (Schneider et al., 30 2015). This dataset was chosen because it is a well established dataset recommended for historical precipitation, global water cycle and trend analysis (Becker et al., 2013;Schneider et al., 2014Schneider et al., , 2017. Furthermore, GPCC-FDD1 also calculates ratios using this dataset and it was readily available on the GPCC High Performance Computer (HPC) where the interpolation was 8 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License.
performed. This approach is commonly known as climatology aided interpolation (CAI) and has two advantages. Firstly CAI reduces the influence of elevation and other variables (Hofstra et al., 2008) which allows us to interpolate with only latitude and longitude as input variables. Secondly, because monthly gridded datasets are often based on much more reliable and stable station networks, especially in areas with problematic daily station coverage, the final absolute values may be more reliable in these regions. The monthly totals for calculating daily ratios in the station timeseries were obtained by summing the daily 5 station data as well. A month was considered complete if it had at least 70% of non-missing days. This, however, was a disadvantage of interpolating anomalies, as even if a daily record existed, it was not used for interpolation if the monthly total was missing because of the completeness criteria. Finally since we use GPCC Full Data Monthly V7 to retrieve daily absolute precipitation values, our analysis is also limited to the temporal extent of this monthly dataset which is currently up to the year 2013. The interpolation parameters and auto-correlation function were also identical to the GPCC-FDD1 product and are 10 described in (Schamm et al., 2014). The interpolation scheme uses the nearest 4 to 10 stations for interpolation (the numbers were chosen to have similar settings as the modified SPHEREMAP scheme utilised for the monthly analysis) and stations within 1 km are averaged to remove station duplicates as well as reduce the impact of such nearby stations on the estimate. For complete coverage, however, the search radius is increased until the minimum station requirement is met. This means that for these stations in data sparse regions, the search radius can be much bigger than the decorrelation length scale of 347 km which 15 is reflected in the Kriging error (see below). The decorrelation length scale is calculated from the autocorrelation function and is indicative of the extent of a station's influence.
Besides the interpolated fields, three other fields characterising the underlying data or uncertainty are provided with the dataset. These are 1. Kriging error which is not an absolute error but rather can be interpreted as percentage of variance (Rubel, 1996). It is a 20 result of solving the Kriging equations and is dependent on the density of the observations and size of the grid (Schamm et al., 2014).
2. Yamamoto standard deviation. This can be interpreted as an absolute error as it is the variance between the estimate and the observations used in interpolation, weighted by the Kriging weights (Yamamoto, 2000).
3. The field of number of stations inside each grid cell is also provided. Note that these are the actual number of observations 25 inside a grid box. Note that this is not the number of stations used for interpolation of that grid cell estimate.
The 1950-2013 average Kriging error (KE) and coefficient of variation (CoV), and the data quality mask based on KE and CoV are shown for REGEN and REGEN40YR in figure 5. The CoV, defined as the ratio of the Yamamoto standard deviation and the precipitation estimate, is a normalised measure of the variance at each grid cell. The Kriging error is largest in regions with a low station density such as Greenland, Africa and South America and is larger for REGEN40YR compared to REGEN 30 as expected (figures 5a and 5b). Coefficient of variation, however, is comparable between REGEN and REGEN40YR. The largest CoV values are once again seen in Africa, South America, Greenland and Southeast Asia (figures 5c and 5d). The resulting data quality mask based on Kriging error and coefficient of variation for REGEN40YR has a smaller global land coverage with particularly sparse coverage in Africa, South America and Asia in both version of the dataset (figures 5e and 5f).
As mentioned earlier, we interpolated two different sets of underlying station data to create two related datasets. The first interpolates all available station data while the second interpolates only the long term data defined by stations with at least forty complete years of data, where a year was considered complete if all twelve months were non-missing, i.e each month had at least 70% non-missing days. The All station dataset (REGEN) is useful for those users who do not have access to a regional precipitation product based on a high station density and would like an approximate estimate of precipitation as well 5 as for users interested in the best estimate (based on as many stations as possible) of precipitation amounts at each time step, accepting that this may result in a decrease in temporal homogeneity. It is also useful for users seeking more complete fields of precipitation over global land areas and less interested in the uncertainties introduced due to station network variability.
REGEN40YR is useful for users conducting a climate scale analysis of precipitation such as looking at trends in various precipitation indices over several decades, since the use of long term stations minimises artificial variability of grid cell values 10 due to network variations. We highly encourage users to use a dataset (REGEN or REGEN40YR) that is suitable to their needs in conjunction with a quality mask (described below).
We provide a quality mask for both datasets where the masked grids are of lower quality. The masks were prepared based on the Kriging error and coefficient of variation. Figures 5e and 5f shows the data quality masks for the two REGEN datasets. percentile of all the grids on the day. A possible recommended use case for the unmasked (high quality) grids of REGEN would be the evaluation of or comparison with another dataset (such as a satellite product) or climate model output.

Results and Evaluation
In this section we evaluate REGEN and REGEN40YR with existing monthly and daily precipitation datasets by showing comparisons of maps and timeseries.  ), CRU TS v4.01 (CRU; (Mitchell and Jones, 2005)) and GHCN

30
Monthly Version 2 dataset (GHCN; (Peterson and Vose, 1997)). Anomalies were calculated by subtracting the average of total annual precipitation from 1950-2010 from the total annual precipitation for each dataset respectively. The variability in annual 10 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License. precipitation totals between REGEN and the other datasets is very similar, especially when compared to GPCC-FDD1 and CRU. GHCN has higher variability in many years compared to the other datasets including REGEN and REGEN 40YR.
3.2 Comparison with regional gridded datasets of daily precipitation

Case study over Sub-Saharan Africa
Based on the maps of Kriging error (figures 5a and 5b) the most data sparse regions of REGEN are Africa, South America, Greenland and northern Russia. Despite the sparsity of data, REGEN can still be useful to get estimates of daily rainfall in some parts of these regions. We use the country of Benin in sub-Saharan Africa as an example. Benin has a tropical climate   Xie, 2008;Xie et al., 2007;Chen et al., 2008) and GPCC Full Data Daily V1 (GPCC-FDD1). For comparability 10 CPC-Global whose native resolution is 0.5 degrees was regridded to 1 degree to match the GPCC-FDD1 and REGEN. The temporal coverage of CPC-Global and GPCC-FDD1 is 1979-2017and 1988 respectively. The temporal averaging and comparison was therefore done over 1988-2013 which is the longest common period between the three datasets. As expected REGEN is more similar to GPCC-FDD1 and REGEN40YR compared to CPC-Global for both the means and trends of both indices. This is because REGEN and GPCC-FDD1 use the same interpolation method and for the most part even the same 15 underlying data. The largest differences between the three datasets arise in data sparse regions in the high latitudes, Africa, South East Asia, and the high altitude regions in western South America. The spatial variability of the differences in annual total and annual maxima trends is higher compared to the spatial variability of differences in averages of the annual totals and annual maxima. Due to the lack of long term stations in Saharan Africa differences in all four indices between REGEN and the long term station based REGEN40YR are larger compared to differences between REGEN and GPCC-FDD1 in north-20 ern Africa. Herold et al. (2016) showed CPC-Global produces lower annual totals compared to an ensemble of observational datasets including GPCC-FDD1, satellite products and reanalyses. This is consistent with our results since the difference in annual totals between REGEN and CPC-Global are positive in majority of global land areas with the exception of northern North America and northern Africa.
Temporal and spatial correlation between REGEN and GPCC-FDD1 (figures 11a and 11b) are also higher compared to 25 temporal and spatial correlation between REGEN and CPC-Global (figures 11c and 11d). In fact the spatial and temporal correlation between REGEN and GPCC-FDD1 is even higher than the correlation between REGEN and REGEN40YR (figures 11e and 11f) because REGEN's station network is more similar to GPCC-FDD1 than REGEN40YR. The areas with poor temporal correlation between REGEN and REGEN40YR correspond to areas with low station density such as the high latitudes, Africa and South America. Compared to the field correlation between REGEN and GPCC-FDD1, the correlation between 30 REGEN and REGEN40YR is also more variable. This may be because the lower station density results in an increase in daily variability in interpolated fields. The drop in field correlation between REGEN and GPCC-FDD1 around 2010 corresponds to the higher percentage of GHCN stations in the last years (figure 1b). There is also a decline in field correlation over time between REGEN and REGEN40YR which may be related to the decline in the number of long-term stations over time. The Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License. temporal correlation between REGEN and CPC-Global is highest in USA, Australia, East Asia and a small part of Europe.
These regions all correspond to regions with good station density throughout the time period.

Summary, limitations and best practice recommendations for users
We present a new gauge-based dataset of gridded daily precipitation with a grid resolution of 1 • × 1 • , global land coverage, and temporal coverage from 1950 to 2013 called REGEN. REGEN was produced by interpolating quality controlled in situ 5 daily rainfall timeseries data using ordinary block Kriging. The interpolation method for REGEN is identical to GPCC-FDD1 (another gridded dataset of daily precipitation from 1988-2013). REGEN also uses all the in situ daily data used by GPCC-FDD1 but expands on this raw data by combining it with GHCN-Daily and raw data from other sources. This resulted in an extended in situ daily precipitation network with coverage back to 1950. The raw data were subjected to comprehensive automated control procedures identical to the one used by the GHCN-Daily dataset and all suspicious data were removed, 10 interpolating only the high quality data. We used climatologically aided interpolation (CAI) which involved interpolating ratios of daily totals and monthly totals and retrieving absolute values by superimposing gridded monthly precipitation fields on the interpolated anomalies. This approach results in more reliable estimates in regions with sparse daily in situ data network and a comparatively denser monthly in situ data network. CAI also reduces the influence of variables such as elevation, distance to the coast etc. which allows us to interpolate using only the latitude and longitude as input variables. The gridded monthly 15 fields used to retrieve the absolute daily precipitation rates came from GPCC Full Data Monthly V7 dataset.
REGEN is currently the longest running dataset of daily precipitation based on gauge-only records with global land coverage making it ideal for any global analysis at climatological scales. We therefore hope it will contribute to the advancement of hydrological science and practice by enabling a number of studies aiming to understand changes and variability in several aspects of daily precipitation distributions, including precipitation extremes, and measures of hydrological intensity. So far 20 the only datasets that allowed global climatological scale analyses of precipitation were monthly datasets or gridded ETCCDI indices, however, the monthly datasets tend to average out the extremes, in turn losing their usefulness when it comes to high impact phenomena related to intense rainfall at shorter timescales. REGEN due to its daily temporal resolution fills this data gap. REGEN like GPCC-FDD1 also provides various uncertainties related to the daily gridded fields which include the Yamamoto standard deviation which is indicative of the proximity of the estimated fields to the raw stations values, the Kriging 25 error which is indicative of the density of stations inside the grid cell and finally also the exact number of stations inside each grid cell. Based on these measures a quality mask for REGEN that combines all three uncertainty information indicating the high quality grid cells (with low uncertainties) is also presented. Users of REGEN are encouraged to use the quality mask in all cases except when spatial completeness is of utmost importance. Alongside REGEN (that interpolates all station data) another related dataset that minimises artefacts due to station network variability by interpolating only the long-term stations 30 (i.e. stations with at least 40 years of complete data) is also produced. Both datasets include bespoke data quality masks. As a result, although the station density is lower in the long-term version, users can use its quality mask to restrict their analysis to higher quality areas. For analyses sensitive to the station network variability the long term station version with the high quality 13 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. c Author(s) 2019. CC BY 4.0 License. mask would be the most suitable. Note, however, due to the lower station density, the long term station version may be less suitable for investigating individual events or short timeseries. The All station version on the other hand would be more suitable for analysis where a complete global coverage is important but temporal homogeneity is of lower priority. In any analysis it is recommended to use the data quality mask, however, in regions where no other daily datasets are available (such as parts of Africa), REGEN may provide a suitable rough estimate of precipitation even in lower quality grids.

5
REGEN has been compared with global monthly and daily, and regional daily gridded datasets of precipitation. The annual precipitation anomalies have been shown to resemble those from the other monthly datasets and the spatial fields of annual totals and maxima as well as their trends more closely resemble GPCC-FDD1 than CPC. Even the daily timeseries of individual events of significant precipitation resemble the respective regional datasets closely in Europe, Australia and USA. The larger inconsistencies between the long term REGEN data and APHRODITE in Asia are indicative of the lower station densities 10 in REGEN in this region. Also note that there is almost no raw in situ daily data in mainland China in 1950. As such any analysis focusing on China using this dataset should not go further back than 1951. Finally, note that despite our best efforts to homogenise station data before interpolating, because the raw data are sourced ultimately from various countries with different measurement practices (such as time of measurement, use of units, quality control and homogenisation steps etc.), inhomogeneities across political borders are possible (Trewin, 2010). 15 The biggest strength of REGEN is the long temporal coverage of quasi-global daily precipitation observations. Regional only used the high quality stations which accounted for roughly 30% of total stations available from the Spanish Meteteoro-20 logical Agency (AEMET). Often the respective meteorological organisation also have the resources to more thoroughly and in some cases even manually quality control the raw data. As a result, regional datasets (where available) may provide more accurate precipitation estimates than REGEN.
At the moment REGEN is not an operational product, meaning the analysis for REGEN was done as a single instance and there are currently no plans on updating it regularly, such as on an annual or biennial basis.  The total annual precipitation, annual maxima and respective trends in the two indices based on the long term REGEN data (REGEN40YR) (figures 12e, 12f, 12g and 12h) are also very similar to REGEN which suggests that the effects of station variations appear negligible at this scale (for trends and averages over  for the high quality grids. The trend maps shown in figure 12 have been masked based on the quality masks as shown in figures 5e and 5f.

22
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper.     Figure 8. Timeseries of daily precipitation from REGEN averaged over Benin in Western Africa. Figure 8a shows the entire timeseries from 1950 to 2013 with the years containing the days with the highest three daily rainfall rates (1957, 1963 and 2008) shown in a darker shade.

27
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper.   Figure 11. Spatial (field) correlation at each daily time-step (first column; figures 11a, 11c and 11e) and temporal correlation between timeseries at each grid cell (second column; figures 11a, 11c and 11e) between REGEN and GPCC (first row), REGEN and CPC (second row) and REGEN and REGEN40YR Long term (third row) data.

29
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-595 Manuscript under review for journal Hydrol. Earth Syst. Sci. This is just a preview and not the published paper. and 12d) and REGEN40YR data that only interpolates stations with at least forty complete years of data (figures 12e, 12f, 12g and 12h).