Assessing historic water extents in rapidly changing lakes: a hybrid remote sensing classification approach

The empirical attribution of rapid hydrologic change presents a unique data availability challenge in terms of establishing baseline prior conditions. On the one hand, one cannot go back in time to collect the necessary in situ data if it were not serendipitously collected when the change was taking place. On the other hand, modern satellite monitoring missions are often too recent to capture changes that are ancient enough to provide sufficient observations for adequate statistical inference. In that context, the four decades of continuous global high resolution monitoring enabled by the Landsat missions are an unri5 valed source of information to study hydrologic change globally. However, extracting the relevant time series information in a systematic way across Landsat missions remains a monumental challenge. Cloud masking and inconsistent image quality often complicate the automatized interpretation of optical imagery. Focusing on the monitoring of lake water extents, we address this challenge by coupling supervised and unsupervised image classification techniques. Unsupervised classification is first used to detect water on unmasked (cloudless and high quality) 10 pixels. Classification results are then compiled across images to estimate the inundation frequency of each pixel, hinging on the assumption that different pixels will be masked at different times. Inundation frequency is then leveraged to infer the inundation status of masked pixels on individual images through supervised classification. Applied to a representative set of global and rapidly changing lakes, the approach successfully captured water extent fluctuations obtained from in situ gauges (when applicable), or from other Landsat missions during overlapping time periods. 15


Introduction
Large lakes often sustain fragile ecosystems while providing critical ecological services to local communities (i.e. fisheries, irrigation and water supply (Kummu et al., 2010). Despite their importance, many lakes are undergoing rapid change, both in their volume and seasonal fluctuation. Once imposing bodies of water have declined to a small fraction of their historical 20 volume in many parts of the world, with the Aral Sea standing out as an iconic example (Micklin, 2007). In addition to dessication, the seasonality of lakes in humid climates are also rapidly changing. Seasonal fluctuations in the Tonle Sap, which are vital to local fisheries, are rapidly changing as a consequenceof shifting flow regimes in the Mekong River Basin (Kummu fuzzy clustering method and k-means clustering (Lu and Weng, 2007). These approaches are limited in their ability to identify water in situation when surface reflectance is impeded by poor image quality (e.g., Landsat 7 Scan Line Corrector failure on 60 images taken after May 2003) or clouds. Clouds in particular obstruct the electromagnetic signal at the frequencies necessary to identify inundated surfaces, and cloud shadows have a similar spectral signature to that of open water (Zhu and Woodcock, 2012). These limitations are particularly restrictive for large lakes in highly seasonal climates which are likely to be at least partially masked by clouds at any given time. For instance, our analysis suggests that the Tonle Sap Lake in Cambodia did not have a single cloud-free Landsat 7 image during 14 of the last 20 rainy seasons (here cloud-free is interpreted as less then 5% 65 cloud coverage and rainy seasons extend between May 1 and October 31).
A variety of statistical interpolation approaches have been developed to infer the status of masked categorical pixels (see Shen et al., 2015, for an extensive review). For lake area monitoring, these approaches often build on the intuition that a masked pixel within the lake's footprint will likely be inundated if its elevation is lower than that of an unmasked pixel that is inundated. Avisse et al. (2017) used a DEM to infer the inundation status of masked pixels through non-linear regressions 70 that accommodate the typically large vertical uncertainty of DEMs. The approach was predominantly applied to man-made reservoirs that were once empty, which allowed DEMs to be used to estimate bathymetry. More recently, Van Den Hoek et al. (2019) measured lake surface elevation via radar altimetry and utilized a regression framework to infer area-volume relationships and classify masked pixels.
Here we combine supervised and unsupervised classification approaches to infer the inundation status of each masked (i.e. 75 cloudy) pixels based their inundation history. The proposed approach relies on the crucial assumption (further discussed in Sect. 4) that clouds are randomly distributed in time and space and that no pixel is permanently covered by clouds. Unlike previous efforts, the approach does not rely on secondary sources of information, which may be unavailable at the appropriate frequency or study period (radar altimetry as in Van Den Hoek et al. (2019)) or prone to high levels of uncertainty (DEM as in Avisse et al. (2017)). Instead, the proposed method relies exclusively on Landsat imagery, making it scalable and accessible 80 (i.e., through Google Earth Engine (Gorelick et al., 2017)), and a promising tool to study long-term change through the analysis of historic imagery from older Landsat missions.

Water detection algorithm
The proposed algorithm consists of four main steps, depicted on Fig. 1.

85
-Pre-processing. A rudimentary cloud-scoring algorithm available on Google Earth Engine (SimpleCloudScore) is used to detect and mask clouds based on "Top of Atmosphere" Landsat reflectance images (see Supplementary Information (SI) S3). Pixels indicated as faulty (e.g., due to the Landsat 7 Scan Line Corrector failure) are also masked out. Individual images can then optionally be aggregated at the desired observation frequency (here monthly) through Normalized Difference Vegetation Index (NDVI) based maximum value compositing (Chen et al., 2003).

90
-Stage 1: Unsupervised classification. The inundation status ("wet" or "dry") of the unmasked pixels of each monthly image is estimated through unsupervised classification.
-Stage 2: Supervised classification. Classified images are then assembled to produce an Inundation Frequency (IF) image indicating the historic probability of a pixel being classified as "inundated". The IF raster is then used to infer the status of the masked pixels in each image through supervised classification, using classified (unmasked) pixels as 95 training.
-Post-processing. A time series of lake area is generated by counting, on each monthly classified image, the number of inundated pixels within a predetermined polygon encompassing the maximum historical extent of the lake. Outliers are removed from the time series using the automatic detection approach suggested in Chen and Liu (1993).
Implementation details of the two classification steps (Stages 1 and 2) are discussed in the following paragraphs. The algorithm 100 is implemented on Google Earth Engine with operational JavaScript source code openly accessible at the url: "https://code. earthengine.google.com/bed0dacc981bb883533f4958e80b50be".

Unsupervised classification
Cloud-free pixels of each monthly image were classified as wet or dry based on their modified normalized difference water index (MNDWI) value (Xu, 2006): where G and MIR are the green and mid-infrared range of the electromagnetic spectrum, respectively (see Table 1 for corresponding bands in the considered imagery). The MNDWI enhances water/land contrasts by leveraging the ability of open water (compared to dry land) to preferentially absorb and reflect in the MIR and G regions of the electromagnetic spectrum, respectively. A clustering algorithm is applied to each image to identify the MNDWI threshold that partitions its 110 pixels into two sets, so as to minimizes the MNDWI variance within each set. Because it can dynamically separate dry and wet pixels in cloud-free images, unsupervised classification stands as a promising (and somewhat less arbitrary) alternative to the manual determination of classification thresholds implemented in past studies (e.g., Müller et al. (2016) among others).
However, by minimizing within-cluster variance, k-means tends to favor clusters of comparable sizes (Jain, 2010), which is problematic for cloudy images with preferential cloud covers on either land or water. As an extreme example, if all unmasked 115 pixels are covered by water, a two-cluster k-means classification will not be able to distinguish water from land. We address this issue by computing the median value from the set of MNDWI thresholds obtained from the classification of individual images. This single median MNDWI threshold is then used to classify all unmasked pixels from all monthly images. Assuming the unsupervised classification can distinguish water from dry land on most images, the median threshold will allow for the identification of all unmasked pixels from the above extreme example as "wet". Using a single threshold comes at the cost of induced classification errors on individual images, e.g., due to changing land cover or atmospheric conditions. This important trade-off is further investigated in Sect. 4.

Supervised classification
A supervised classification approach was used to infer the wet/dry status of masked (cloudy) pixels based on their inundation history. To do so, space-time information on historic water extents are compiled into a single IF image representing the historic 125 probability of each pixel i being identified as "wet" by the threshold-based unsupervised classification: where N  critical assumptions (further discussed in 4.2): (i) cloud coverage is not affected by the inundation status of a pixel; and (ii) a 130 more often inundated pixel is "lower" topographically, and therefore more likely to be inundated. More specifically, if cloudfree pixels associated with a certain IF value are inundated in a given image, it is very likely that pixels associated with an equal or higher IF (i.e. pixels of equal or lower elevation) are also inundated. We instrumentalize this principle in a supervised classification process (random-forest algorithm Pelletier et al. (2016)), that estimates the statistical association between the (known) classification status of unmasked pixels and their inundation frequency given by the IF image. The algorithm then 135 uses the estimated relationship to infer the status of the masked pixels of that image based on their own inundation frequency.
Classification noise that emerges from the uncertainty of the estimated statistical relationship are then removed through morphological filtering (Schowengerdt, 2006). The supervised classification algorithm is run independently on each individual image using a different set of unmasked classified pixels as training, but using the same IF image as the independent variable. This implies that the relative inundation frequency of each pixel (i.e. the bathymetry of the lake) is assumed to not change over 140 the duration of the study period, which is a corollary of assumption (ii) discussed above.

Data sources and validation
The approach was validated on Lake Buchanan (TX, USA), Choke Canyon Reservoir (TX, USA) and Cannonsville Reservoir  Table 1. Properties of Data Sources (Survey and , 2015) the performance of our approach (see Sect. 4). For each lake, monthly water extents were determined based on daily water levels using the provided elevation-area-capacity tables (see S2 for detailed procedures) and corrected for additive bias. Indeed, visual inspection of satellite classification output during cloudless days when water and land can be clearly distinguished 150 (Fig. 3) reveal a near-perfect detection of water extents, which area does not match the area obtained from in situ water level observation. The source of this bias is unclear (and beyond the scope of this paper) and may be related to temporal changes in the elevation-area-capacity curves, for instance due to sedimentation processes (Raje and Mujumdar, 2010). Bias correction was implemented as follows: where A i indicates a (biased) in situ area estimate, A i its average value across observations, and A (RS) i indicates the average lake area from the remote sensing observation.
Additive biases notwithstanding, the above validation demonstrates the ability of the approach to capture changes in lake water extent with no in situ data requirements. We illustrate this capability by applying the approach to reconstruct the historic water extents of three rapidly changing lakes that are of major regional significance: Lakes Tonle Sap (Cambodia), Urmia (Iran) 160 and Chapala (Mexico). We did not have access to in situ data from these lakes for direct validation. Instead, we compared estimates from the Landsat 7 to estimates from Landsats 5 and 8 during respective overlapping periods. This provides a reasonable estimate of the error in data-scarce regions as error sources across Landsat missions are close to independent (different sensors on different space platforms taking images at different times, see S1).

Validation against in situ observation
Validation results are presented in Fig. 3, where lake area outputs from Landsats 5 (red) and 7 (blue) are compared to biascorrected in situ observations. Outliers removed by the automatic detection procedure (Chen and Liu, 1993) (Koračin et al., 2014). We test this hypothesis by comparing the inundation frequency of pixels during cloudless days (the output of the first stage of the algorithm), with their unconditional inundation frequency (during cloudy and cloudless days). The latter was determined by computing the probability of inundation predicted by the second stage of the algorithm (supervised classification), which includes cloudy 215 days. As before, we sampled 4000 pixels with IF values between (and excluding) 0 and 1 for both images (cloudless and unconditional). We then ranked the pixels according to their IF value for each image. The independence assumption implies that the pixels rank is not affected by its cloud coverage status: a pixel with a higher inundation frequency than another for a subset of observations that had cloudless conditions should also have a higher inundation frequency if the full sample of observations (cloudless and cloudy) is considered. Results, shown in Fig. 5 (bottom), suggests that the ranking of inundation 220 frequency does not depend on cloud coverage. Again, to interpret this, consider that preferential cloud coverage will only affect classification output if it concerns pixels near shores (i.e. where 0 < IF < 1), whereas lake-induced fog tends to occur over permanently inundated pixels (Koračin et al., 2014

Assumption 3: Detection
The procedure used to detect open water under cloudless conditions is widely used and its performance well established 225 (see, e.g., MNDWI (Xu, 2006)). Nonetheless, two important limitations of MNDWI-based water detection may affect the performance of our approach. The first issue concerns the difficulty in distinguishing open water from cloud shadows which have a similar MNDWI reflectance (see Zhang et al., 2015). Cloud shadows are visible on Landsat imagery when cloud cover is fragmented and, if shadows dry land in the vicinity of the lake, may cause an overestimation of the surface area of the lake. This error will be amplified by the supervised classification step of our approach (Stage 2), which will mistakenly associate 230 low IF pixels to inundated conditions. This issue is particularly salient for smaller lakes with surface area on the order of the typical size of cloud fragments. To the extreme, a small lake completely covered by a cloud shadow would lead to a very large overestimation of its surface area. Note that a lake completely covered by clouds would be removed from the analysis in the pre-processing step. While problematic, such large individual deviations in water extent estimations are typically captured by the automatic outlier detection implemented in the post-processing stage.