Interactive comment on “ WRF simulation of a precipitation event over the Tibetan Plateau , China – an assessment using remote sensing and ground observations ” by F .

Authors’ response to Anonymous Referee 2 reviewer comment (C1739, 09 Aug 2010) General reply We are thankful to the Referee 2 for reviewing our manuscript. Some questions raised in this comment are similar to the first referee’s remarks, and we kindly ask the reader to refer to our answer 1 for a more comprehensive description of the study’s objectives. We understand from the two reviews that the presentation of


Introduction
The Tibetan Plateau (TiP) is the source of many major rivers in Central Asia, thus affecting hundreds of millions of people in the surrounding regions.Its glaciers are characteristic elements of the natural environment, forming water resources of great importance for both ecosystems and local population.Yao et al. (2007) underlined the importance of the Tibetan and Himalayan glaciers on the hydrological conditions in Asia.Several studies have shown the strong control of orography and boundary-layer structure of the TiP on the Asian monsoon system (e.g., Gao et al., 1981;Hahn and Manabe, 1975).
Focusing on the monsoon history over the past, present and future, the Sino-German Priority Programme 1372 (Appel and Mosbrugger, 2006) was initiated by the Deutsche F. Maussion et al.: WRF simulation of a precipitation event over the Tibetan Plateau, China Forschungsgemeinschaft (DFG) to develop a multidisciplinary approach dealing with the complex processes and interactions taking place between the major driving forces on the TiP.This study took place within this frame.
Located in a transition zone between the continental climate of Central Asia and the Indian Monsoon system, the Nam Co drainage basin including the western Nyainqentanglha Mountains has been pointed out as a key research area in Tibet, and is also investigated in this study.The recent rise of the lake level of Nam Co, one of the largest and highest lakes on the TiP (year 2000: 1980 km 2 area, 4724 m a.s.l.lake level altitude), has been attributed to glacier retreat as well as to an increase of precipitation in recent decades (e.g., Wu and Zhu, 2008;Krause et al., 2010).Precipitation increase in central TiP during this period has also been reported by Liu et al. (2009).However, the TiP remains a sparsely observed region, and there is limited availability of meteorological data.This is especially true for long-term weather records necessary for reliable climatological studies (Frauenfeld et al., 2005;Kang et al., 2010).In particular, no longterm data from weather stations are existing at elevations above 4800 m a.s.l.Generally, the geographical distribution of weather stations is biased towards lower altitudes, flat areas and specific land-cover types excluding high-mountain regions covered by glaciers.The question of the respective contributions of glacier retreat and precipitation increase to rising lake levels on the TiP remains unanswered, so far, for these reasons.
Besides air temperature, precipitation is considered to be a key variable for understanding recent environmental variability and trends on the TiP.Unfortunately, precipitation is strongly influenced by terrain, and can hardly be retrieved from existing gridded precipitation data sets, especially in mountainous regions.This has been discussed e.g. by Yin et al. (2008) for problems in using remotely sensed precipitation data sets derived from the Tropical Rainfall Measuring Mission (TRMM), and also for global atmospheric reanalysis data (e.g., Ma et al., 2009).
The constantly improving capabilities of numerical weather prediction (NWP) models offer the opportunity to reduce this problem by providing precipitation fields and other meteorological variables at high spatial and temporal resolution.Generally, NWP models are suitable not only for weather forecasting but also for dynamical downscaling of large-scale atmospheric processes.NWP models can be initialised and laterally forced by assimilated observational data describing the large-scale atmospheric conditions throughout the simulation period, thus keeping the model results close to observations also at finer spatial scales.This approach enables validation of the model output for single events but does not allow forecasts, since the assimilated observational data have to be available not only for the time of model initialisation but for the whole simulation period.
Longer time periods of years to decades can be simulated by NWP models by successive model runs of shorter periods of time integration of days to weeks.We will subsequently use the term "regional atmospheric reanalysis" for this kind of NWP-model application.A good example of a regional atmospheric reanalysis is given by Box et al. (2004) who used the Atmospheric Research Mesoscale Model MM5 for generating a contiguous multi-year weather data set for Greenland by dynamical downscaling of 2.5 • operational analyses from the European Centre for Medium-Range Weather Forecasts (ECMWF) by a sequence of daily model runs.Box et al. (2006) used the MM5 output for driving a surface mass balance model of the Greenland Ice Sheet.Caldwell et al. (2009) simulated the climate of California for a 40 yr period by the Weather Research and Forecasting (WRF) model, which has been re-initialised every month.The latter simulation is, however, not driven by assimilated observational data but by the output of a General Circulation Model (GCM), which makes it impossible to validate the model results also on an event-basis.Dynamical downscaling of GCM simulations required for climate reconstructions or projections could also be done by Regional Climate Models (RCM).Lo et al. (2008), for example, discussed different strategies for time integration.Fowler et al. (2007) and Laprise et al. (2008) presented in-depth discussions of approaches and challenges including e.g.hydrological applications.
The approach of Box et al. (2004) is of special interest also for the TiP.However, the capacity of the models in retrieving snow-and rainfall in complex terrain is still discussed.Using higher horizontal spatial resolution of less than 10 km has been advanced as a substantial improvement, as it allows more accurate representation of mountain regions.Mountain-valley structures in the Nyainqentanglha Mountains are often showing elevation differences of 1 to 2 km within short distances of less than 10 km, which is common for high-mountain regions all over the world.Zaengl (2007) shows that increased spatial resolutions in areas of complex terrain can be highly beneficial for simulating precipitation fields.However, higher spatial resolution does not automatically improve a NWP models skill in predicting precipitation in mountainous regions.Studies like the one from Zaengl (2007) have shown that increasing the spatial resolution does not improve the model quality for precipitation caused by embedded convection.
The effect often cited as orographic bias is described as strong over-prediction of precipitation rates along windward slopes while predicted snowfall lie under measured values (e.g., Leung and Qian, 2003).This bias stems from a variety of sources, but Caldwell et al. (2009) argue that models themselves may be the dominant cause, for instance due to inappropriate physical parameterisation schemes.However, since observations of precipitation in high-mountain areas are generally sparse and accuracy of observations is limited (especially for snowfall), quantitative analyses of orographic bias are rarely carried out and missing for the TiP.There is, so far, no general answer how to quantify precipitation Hydrol.Earth Syst.Sci., 15, 1795Sci., 15, -1817Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/1795/2011/ in high-mountain drainage basins, prerequisite to subsequent hydrological studies.
Only few studies employed the Weather Research and Forecasting (WRF) model on the TiP, by that time.Li et al. (2009) investigated the sensitivity of the WRF model to surface skin temperature of Nam Co for simulating a precipitation event.Sato et al. (2008) analysed the sensitivity of the WRF model to horizontal grid spacing with respect to simulate the diurnal cycle of precipitation.Their results show that the finest spatial resolution (7 km) is more efficient in representing diurnal cloud formation than coarser grids.So far, no study used the WRF model for a regional atmospheric reanalysis.

Objectives
The main objective of our study is optimising the design of a modelling strategy including a suitable model configuration for producing a weather data set of high spatio-temporal resolution for the study region by a regional atmospheric reanalysis.This is particularly relevant for subsequent hydrological and glaciological studies, e.g.requiring detailed precipitation fields as input data in hydrological and glaciological models.In this paper, we therefore concentrate on the capacity of the WRF model in retrieving data on solid and liquid precipitation.For this purpose, we focus on a specific precipitation event over the TiP (see Sect. 1.3) including its contribution to monthly precipitation amounts.
While dynamical downscaling studies employing RCM use integration periods from months to years or even many decades, the objective of our study is to analyse the capability of the WRF model as a NWP model to be employed for a regional atmospheric reanalysis, enabling direct comparison of model results with observations.By using a continuous sequence of short time integration periods, we prevent the model deviating too much from large-scale observations.After two days of simulation we expect the model to be less accurate than during the first two days.Therefore, our study also aims at a quantitative analysis of the effects of using different strategies for time integration.
In this study, we address two main research questions: 1. Which validation methods and data sets are suitable for assessing the accuracy of simulated precipitation fields of high spatio-temporal resolution over the TiP?
2. Is a specific set-up of the WRF model able to reanalyse precipitation fields in the mountainous and sparsely observed region of the TiP?
The first question is less frequently addressed in similar model studies since observations are usually taken as an absolute reference to assess the performance of a model.However, the particularities of the TiP do not ensure the applicability of validation approaches that have been proven to be suitable in other regions.Especially without knowing the limitations of the validation data sets the second question may not be answered appropriately.
The WRF model, like other NWP models, offers a broad spectrum of options for setting up and forcing simulations, including various parameterisation schemes for sub-scale processes.We present a sensitivity study following the general ideas as discussed e.g. by Rakesh et al. (2007) or Yang and Tung (2003) to quantify the uncertainty in the model output caused by the model itself.
The study region and the simulation period are described in Sects.1.2 and 1.3, respectively.The design of the reference experiment, as well as the validation data sets and methods are described in Sect. 2. The results of the study, i.e., the validation of the reference experiment by three observational data sets, and those from the sensitivity study, are presented in Sect.3. The results are further discussed in Sect. 4 to give answers to the main two research questions.Finally, we draw conclusions from our study.

Study region
The study region and the set-up of the three nested domains used by the WRF model are presented in Fig. 1.The large domain (LD) covering an area of 4500×4500 km 2 is used to capture large-scale processes and to avoid model artefacts near the lateral boundaries.The studied precipitation event originated from the Bay of Bengal, which is fully covered by the LD.
The medium domain (MD) covers an area of 1500×1500 km 2 comprising large parts of the TiP including the western and eastern Nyainqentanglha Mountains.The southern-eastern part of the TiP is strongly influenced by the summer monsoon, and has generally lower altitudes, thus maritime (temperate) glaciers are present, while glaciers in the central, northern and western parts of the TiP are mostly continental (cold or polythermal) (see Shi and Liu, 2000, for a detailed description).
Detailed analyses are carried out for the small domain (SD) covering an area of 300×300 km 2 .The SD is centred on Nam Co and its drainage basin.The highly glaciated western Nyainqentanglha Mountains (see Bolch et al., 2010), reaching elevations of more than 7100 m a.s.l., are fully contained in the SD.Nam Co and the other lakes significantly influence local climates and atmospheric moisture content (Haginoya et al., 2009).The presence of the well-equipped Nam Co Monitoring and Research Station (30 • 46 N, 90 • 59 E, 4730 m a.s.l., located in the southeastern shore of the lake; see Fig. 1), operated by the Institute of Tibetan Plateau Research (ITP) of the Chinese Academy of Sciences (CAS), makes it one of the most intensively studied regions on the TiP, and thus an ideal test bed for our study.

Simulation period
Focusing model-validation studies on short simulation periods enlightens some particularities and issues that are not visible in long-term validation studies.Short-term validation studies enable evaluation of precipitation rates and cumulated amounts on a process-oriented basis.Individual strong precipitation events are well suited for this kind of analysis, also due to the fact that relative errors in simulated precipitation are generally larger in these cases, and thus easier to detect and quantify.
The complex terrain of the high-mountain fringe of the TiP and its blocking effect on moisture transfer coming from the Indian and Pacific Oceans has a characteristic impact on the formation of orographically induced storms (Chen et al., 2007) causing strong precipitation events.The tropical cyclone Rashmi formed in the Bay of Bengal on 24 October 2008, reaching the coast on late evening of 26 October.Strong winds and heavy rainfall occurred over Bangladesh and India causing substantial damages and fatalities (the India Meteorological Department made a comprehensive description of the event in IMD, 2008).The system weakened rapidly after landfall, carrying along further precipitation, mainly as snowfall, on the Himalayas and the TiP.This precipitation event happened after the monsoon period, and is challenging the model by its complexity: cyclonal formation overseas and snowfall over the TiP.
On 27 October 2008 daily precipitation amount averaged over the 19 operational weather stations used in our study (see Sect. 2.4) is the absolute maximum of the last decade.The event was also one of the strongest snowfall events in the autumn season affecting large areas on the TiP that have been snow-free prior to the event, which allows a quantitative analysis of the simulated snowfall.Thus, October 2008 was chosen for the study as simulation period.A one-week period between 22 and 28 October 2008 was used for detailed analysis of the precipitation event.
Using a simulation period of one month offers the opportunity to include weather situations without or with only light precipitation in the validation study.In addition, monthly precipitation is frequently used in climatological studies, and many gridded climatological data sets are available on a monthly basis.Strong precipitation events are often main contributors to monthly or even annual precipitation, as demonstrated in Fig. 2. Thus, it is of high relevance for climatological purposes to understand a NWP models capability to simulate an individual weather system, especially when the model shall be employed for a regional atmospheric reanalysis.Figure 2 shows that all the stations on the TiP used in this study received half of the annual precipitation in 2008 within 12 and 32 days.(Skamarock et al., 2008).The three nested domains described in Sect.1.2 and displayed in Fig. 1 are used in all the experiments of this study.Spatial resolutions of the different model grids covering the three WRF domains are 30 km for the LD grid, 30 and 10 km for the two MD grids, as well as 30, 10 and 2 km for the three SD grids.WRF model output in the three different spatial resolutions (30, 10 and 2 km) will be named WRF30, WRF10 and WRF2 to avoid confusions between model domains and spatial resolutions (e.g.WRF30-SD indicates the WRF results for the 30 km grid of the SD).
We have designed a reference experiment against which a series of further experiments are performed and analysed to understand how and why simulated precipitation fields change with modified nesting and forcing strategies, as well as with modified physical parameterisation schemes.The design of the reference experiment is resumed in Table 1.The choices for the design of the reference experiments have been following three principles: 1.The reference experiment should incorporate experience gained from comparable numerical modelling studies as far as possible, such that its design is expected to be one of the most suitable candidates for a decadal regional atmospheric reanalysis.
2. The nesting and forcing strategy of the reference experiment shall not only enable reanalysing precipitation fields but also be suitable for further model experiments, e.g. for simulations using modified boundary conditions.
3. The reference experiment shall preserve the predictive skill of the model providing the large-scale forcing fields while concurrently allowing the WRF model to utilise its predictive skills at the resolved scales.
The nesting strategy of the reference experiment is based on a novel method, solving a scientific conflict currently discussed concerning the advantages and disadvantages of one- Mellor-Yamada-Janjic TKE scheme (Janjic, 2002) way versus two-way nesting.Some authors (Harris and Durran, 2010) argue that two-way nesting increases the predictive skill of a NWP model within the child domain, while others (e.g., Bukovsky and Karoly, 2009) could show that twoway nesting generates artefacts in the parent domain near the borders of the child domain.We are therefore using a cascade of three simulations: 1. Results for the LD are obtained from a simulation without nesting.
2. Results for the MD are obtained from a simulation of the LD as parent domain and the MD as a child domain using the two-way nesting capability of the WRF model.
3. Results for the SD are obtained from a nested simulation of the three domains using the two-way nesting capability of the WRF model.
This approach allows benefiting from the two-way nesting approach in the respective child domain while concurrently avoiding the artefacts in the respective parent domain.The reference experiment is based on a forcing strategy using daily re-initialisation.The simulation comprises 31 consecutive model runs of 36 h time integration.Each run starts at 12:00 UTC (all times are further specified in UTC).The WRF model (as any NWP model) requires some spin-up time to reach a balanced state with the boundary conditions.We have therefore discarded the first twelve hours from each model run, thus the remaining 24 h of model output provide one day of the one-month simulation.This forcing strategy has been analogously used by Box et al. (2004).
Meteorological input data sets are the standard final analysis (FNL) data from the Global Forecasting System (GFS) with additional sea surface temperature (SST) input (see Table 1).The employed version of the WRF pre-processing system (WPS) does not properly handle initialisation of lake temperatures.The WPS sets the water temperature either to an arbitrary value or, when SST fields are available, to the SST value.In our case, this leads to drastic errors in simulated precipitation.Water temperatures of the high-altitude lakes on the TiP are simply extrapolated from the SST of the Bay of Bengal without considering the huge elevation difference, resulting in water temperatures of about 30 • C for Nam Co in October 2008.As proposed by Li et al. (2009) we used remotely sensed skin temperatures of the Nam Co for retrieving the initial lake temperature.We used the Moderate Resolution Imaging Spectroradiometer (MODIS) eight-day land-surface temperature product (MODA11 8-DAY 1KM L3 LST, version 5, 23-30 October 2008).Mean temperature of the water-covered grid cells was computed for the dayand night-time MODIS scenes to obtain a mean skin temperature of 4.9 • C for the simulation period.This value is consistent with the lake-temperature climatology of Haginoya et al. (2009), and was used to initialise the water temperature of Nam Co and surrounding water bodies for each model run.The positive effect of this correction is clearly seen in daily precipitation amounts at the Nam Co research station on 27 October 2008: the observed precipitation amount is 8 mm, while the model computed 117 mm before the correction of the lake temperature, and 30 mm after the correction.
The parameterisation of convective processes and related formation of cumulus clouds (CU) was only applied to the 30 and 10 km model grids.Precipitation computed by the CU parameterisation scheme is stored separately from the precipitation resolved by the grid, enabling the quantifica-tion of the percentage of convective precipitation to total precipitation.The parameterisation of micro-physical processes (MP), the land-surface model (LS) and the parameterisation of processes in the planetary boundary layer (PBL) are forced by WRF to be identical in all nested domains.The parameterisation schemes for short-and long-wave radiative fluxes were kept constant in all model experiments (Table 1).

TRMM precipitation
The WRF model output is compared to the precipitation data set of the TRMM, providing precipitation estimates derived from a combination of remote sensing observations calibrated against a large number of rain gauges on a monthly basis.In this study the 3B42 version 6 product is used (Huffman et al., 2007).The data sets is covering the regions between 50 • N to 50 • S with a spatial resolution of 0.25 • , with outputs at 3 h intervals.The three-hourly data are aggregated to daily, one-week and one-month values for the validation.
The TRMM data sets were projected to the map projection used by WRF (see Table 1) and resampled by nearestneighbourhood interpolation to grids for each WRF domain of 30 km spatial resolution.

MODIS snow extent
MODIS refers to two instruments currently collecting data as part of NASA's Earth Observing System (EOS) program.The MODIS/Terra Snow Cover Daily L3 Global 500 m Grid (MOD10A1) contains data on snow extent, snow albedo, fractional snow cover, and Quality Assessment (QA).The MOD10A1 data set consists of 600×600 km 2 granules of 500 m spatial resolution gridded using a sinusoidal map projection.MODIS snow cover data are based on a snowmapping algorithm that employs a Normalised Difference Snow Index (NDSI) and other criteria tests (Hall et al., 2006).
The MODIS data sets used in this study (Fig. 3) are mosaics of four adjacent granules acquired around 05:00 UTC on 22 and 29 October 2008, corresponding to mid-morning local solar times.We selected MODIS data for a day prior to and a second one after the precipitation event to compare the observed changes in snow extent with WRF snowfall predictions.Because cloud coverage does not allow retrieval of snow data from optical sensors, only MODIS data of sparse cloud coverage are suitable for validation, preventing more detailed analyses for regions that have never been cloud-free during the one-week simulation.
The MODIS data sets were reprojected to the map projection used by WRF (see

Weather stations
Data from weather stations used in this study are from the "Global Summary of the Day" (http://www.ncdc.noaa.gov/oa/ncdc.html)provided by the National Climatic Data Center (NCDC) for download free of charge.The stations included in this data set follow the recommendations of the World Meteorological Organization (WMO), and data are undergoing quality control before being published.The weather stations selected for this study follow two criteria: they must be located within the MD, and must be situated above 3000 m a.s.l.Data from just 19 operational weather stations fulfilling these criteria are available for the simulation period, showing how much the TiP still lacks of observations.The weather stations are not homogeneously distributed over the study region, since they are concentrated in more densely populated regions in the southern and eastern parts of the TiP.In addition, precipitation data measured at the Nam Co Monitoring and Research Station were used to validate the WRF model (see Fig. 1).

Scores for statistical evaluation
Scores are commonly used for validation purposes to statistically assess the performance of a model simulation relative to observations (validation) or to results of other model simulation (inter-comparison).Some of them are derived from a 2×2 matrix called "contingency table" (e.g., Wilks, 1995), where each of the elements (A, B, C, D) holds the number of combinations of model prediction and observation in a given statistical population (see Table 2).In this study, five different scores are used.
The bias score (BIAS) is defined as: where F is the number of cases where the event was predicted, and O is the number of cases where the event was observed.This score is an indicator of how well the model recovers the number of occurrences of an event, regardless of the spatio-temporal distribution.
The False Alarm Rate (FAR) computes the fraction of predicted events that where not observed: The Probability Of False Detection (POFD) is the fraction of predicted events that have not been observed relative to the total number of unobserved events: Like the FAR the POFD is not a perfect indicator for validation since it depends on the number of unobserved events, but is convenient for inter-comparison since it does not depend on the number of unpredicted events.Similarly to the POFD the Probability Of Detection (POD) is the fraction of predicted events relative to the number of observed events: Finally, the frequently used Heidke Skill Score (HSS) is defined: H is the number of hits, i.e., the number of cases where prediction and observation are in accordance, while N is the size of the statistical population.The HSS indicates the capability of a simulation to be better or worse than a random simulation, and ranges from −1 to 1 (1 for a perfect and 0 for a random case).
In addition to the scores based on the contingency table, the standard Mean Bias (MB) and the Root Mean Square Deviation (RMSD) are defined as: where P p and P o are the predicted and observed precipitation values.

Validation of predicted precipitation by TRMM observations
WRF30 and TRMM daily precipitation fields during the cyclone life are presented for a subset of the LD in Fig. 4. The cyclonal precipitation patterns can be recognised in both data sets.The cyclone is traceable by the high daily precipitation following its movement.In the WRF30 output the centre of the cyclone shows a local precipitation minimum on 24 October indicating that an eye has formed, which is, however, not present in the TRMM observations.On 25 October, the maximum of daily precipitation amount observed by TRMM is following the centre of the cyclone.The eye is no longer visible in the WRF30 output of this day.Both data sets accordingly show that strong precipitation caused by the cyclone occurs in the eastern and southern parts of Bangladesh.On 26 October the cyclone reaches the coast in the late evening hours.The precipitation maximum of this day is still overseas for TRMM, whereas the WRF30 already predicts maximum precipitation over Bangladesh.This discrepancy is explainable by the difficulties in allocating the strong precipitation to the correct day.TRMM observes less precipitation over Bangladesh on 27 October than the WRF30 predicts for that day.Over the Himalayas and the TiP, precipitation patterns are generally comparable.Two precipitation maxima are seen in both data sets, which were induced by the blocking effect of the Himalayas over the slopes of Bhutan and by the eastern Nyainqentanglha Mountains.Also, both data sets show the precipitation front propagating over the TiP in a similar way.Generally, WRF30 daily precipitation is larger than observed by TRMM, as indicated by the BIAS and FAR scores in Fig. 4. The two different threshold values used for computation of the scores in Fig. 4 show that a higher threshold lowers both the POD and the POFD.The HSS indicates that predictions based on small threshold values are generally better in accordance with TRMM than those based on high threshold values.
The spatial patterns presented in Fig. 5 reveal that WRF30-MD generally predicts more events, especially on the northern part of the TiP.Nevertheless, weather-stations measurements suggest that this northern limit is properly predicted.The two data sets are still in good agreement at 20 mm week −1 , but the HSS constantly decreases as the FAR increases.
Both TRMM and WRF30-MD show precipitation maxima of more than 60 mm week −1 in the eastern Nyainqentanglha Mountains and in north-eastern India, but more than 70 % of the 60 mm week −1 events predicted by WRF30-MD were not observed by TRMM.However, the maximum in weather-station measurements on the eastern Nyainqentanglha Mountains (138 mm week −1 ) suggests that the actual precipitation pattern may extend further in the western TiP than observed by TRMM.The reason behind this finding is probably the insufficient capability of the TRMM to detect snowfall and light rain.
WRF30-MD RMSD (54.3 mm week −1 ) and MB (24.3 mm week −1 ) with respect to TRMM indicate generally higher values predicted by WRF30 than observed by TRMM.The one-week HSS are higher than the daily scores presented in Fig. 4, most probably due to two major reasons: (1) possible discrepancies due to timing shifts are withdrawn when looking at the one-week precipitation amounts, (2) the considered spatial subset is different with respect to precipitation patterns.

Validation of predicted snow depth by MODIS observations
The goal of this test is to analyse where snowfall has been simulated by the WRF model in comparison to observational data.
The two MODIS data sets do not contain data on snowfall directly.Thus, we selected areas from the MODIS data set that were snow-free prior to and snow-covered after the event.Other areas that have been snow-covered prior to the event or snow-free after the event (e.g.lakes) are excluded   from the test, although snowfall might also have been occurring there.Unfortunately, large areas have been covered by clouds either in one or both of the two MODIS data sets, and thus have also to be marked as areas of no data.
We concentrated the validation on snow depth derived by the model from predicted snowfall.A grid point is consid-ered to be covered by snow when the computed snow depth after one-week exceeds a certain threshold.Spatial distributions of predicted snow extent were computed for threshold values between 0.2 and 20 cm week −1 .Five grid points at the border of the domains are removed to avoid artefacts, and predicted snow extent was resampled to the respective 500 m    2008) evaluated MODIS snow extent in northern Xinjiang, China, and they found that MODIS has high accuracies (93 %) when mapping snow at snow depth ≥4 cm but does not have a proper accuracy for thinner layers.This threshold can be considered as physically reasonable: at the station Baingoin (north-west of Nam Co), 6 mm precipitation was recorded (6 cm of snow assuming a standard snow-toliquid-equivalent ratio of 10), and the pixel was classified as snow by MODIS.
The evolution of the HSS with the threshold applied to the WRF simulated snow depth with respect to MODIS is shown in Fig. 6 (left) along with the spatial patterns of the contingency tables for two optimal thresholds for WRF10-MD and WRF-SD in the three resolutions.The HSS curves for the MD simulations are very similar, especially for lower thresholds, while there are more differences between the HSS curves for the SD.This is interpreted as an effect stemming from the higher spatial variability of snowfall in the MD due to large altitudinal variations compared to the less hetero-geneous situation in the SD.In the Himalayas, the snow to rain limit is caught accurately by both WRF10 and WRF30 (not shown).Higher HSS for the simulations for model grids of higher spatial resolution, especially for higher thresholds, indicate the advantage of improved spatial resolution for predicting snowfall, particularly in the SD.The maximum HSS for the SD is reached at a smaller threshold than for the MD.This difference is attributed to the fact that altitudes in the SD are generally higher than in the MD, thus the percentage of areas affected by snowfall is higher as observed by MODIS due to generally lower air temperatures.Snow melt is less frequently occurring in the SD than in the MD, and therefore even small amounts of snowfall will increase the snow extent.This argument is also supported by the contingency maps displayed in Fig. 6, which reveal that most of the WRF2-SD snow extent not observed by MODIS is located in the lower-altitude valleys south of the western Nyainqentanglha Mountains.
The higher HSS obtained by the simulations for the MD compared to the SD are resulting from the higher percentage of unobserved snowfall in the MD, increasing the number of hits of the WRF simulations in the MD due to the high number of correctly predicted no-events.Figure 6 shows that the transitional zone between areas affected by snowfall and  (Pleim, 2007) snow-free areas is situated in the north-western part of the SD, and is well detected by the WRF2-SD for the threshold of 2 cm.Increasing the threshold shifts the transitional zone to the South-west in all WRF simulations for the MD and SD, such that the contingency maps show a switch from false alarms to unpredicted events for all areas where prediction and observation are not concordant.

Validation of predicted precipitation by observations at weather stations
Table 3 shows observed precipitation at each weather station in comparison to precipitation observed by TRMM and predicted by WRF30, WRF10 and WRF2.Results from the grid points nearest to the weather stations are used in the comparison.Differences between the WRF simulations are generally small except for two stations: Nyingchi and Deqen, the later showing a large improvement from WRF30 to WRF10, illustrating the strong terrain dependency of precipitation in mountainous terrain.The RMSD and MB scores show a significant improvement from WRF30 to WRF10.Some discrepancies between station observations and TRMM could 39 Fig. 7. Sensitivity to the nesting strategy: HSS curves for snow depth predicted by WRF simulations (TW, OW and RE) at all resolutions for thresholds between 0.2 and 20 cm, with respect to snow extent observed by MODIS on the SD.be explained by errors in one or the other data-set (at Lhasa, TRMM and WRF are in good agreement and at Deqen, station observations and WRF are concordant) and for some stations the three data-sets differ substantially.Generally, WRF10 has a lower RMSD than TRMM, but a higher MB.
Unfortunately, no in-depth evaluation for WRF2 could be made, because only four stations are situated in the SD.The results for these stations suggest a small but overall improvement to WRF10, again indicating the advantage of using grids of higher spatial resolution.

Sensitivity study
In this section, we investigate the sensitivity of the WRF model to the nesting strategy, the forcing strategy, and various physical parameterisations schemes (PPS).Table 4 provides an overview on the sensitivity experiments carried out in this study, while Tables 5 and 6 present the results.

Sensitivity to the nesting strategy
Two experiments have been carried out for analysing the effect of the nesting strategy on simulated precipitation.In contrast to the nesting strategy in the reference experiment, the results of the simple two-way nesting (TW) experiment are not only used for the SD (as in the RE, i.e., the WRF2 SD results of the RE and TW experiments are identical) but also for the MD and LD, such that these results also contain some artefacts at the borders of the respective child domains.
A second nesting experiment was performed using one-way nesting (OW) for a nested simulation of the three domains.Thus, the results of the RE and OW experiments for the LD are identical, but differ for the MD and SD.
Hydrol.Earth Syst.Sci., 15, 1795-1817, 2011 www.hydrol-earth-syst-sci.net/15/1795/2011/The results of all validation analyses applied to the TW and OW sensitivity experiments are presented in Table 5 together with those of the reference experiment.Generally, the differences in the overall scores between the sensitivity experiments are small, indicating that this element of the experimental design is not highly sensitive for reanalysed precipitation fields.
The most suitable analysis for assessing the performance of the nesting experiments is presented in Fig. 7, where detection of snowfall in the SD is validated by the MODIS data on snow extent.Only in the SD the three different spatial resolutions can be compared to each other.A threshold value of 4 cm is used for comparing predicted snow depth with MODIS observations since this value is considered to be physically reasonable regarding the capability of MODIS for detecting snow (see Sect. 3.1.2).HSS curves of the TW experiments for detecting snowfall in the SD are generally higher than those of the respective OW experiments.Figure 7 shows two major features of the two-way nested approach: the skill of the coarser resolutions is improved, and the higher-resolution results are also slightly meliorated by the step-wise feedback mechanism.Thus, the use of the two-way nesting option is recommended, although not being decisive for the overall performance.

Sensitivity to the forcing strategy
Two experiments were carried out to analyse this element of the experimental design.In contrast to the other experiments, only the one-week period of the precipitation experiment was covered by the simulations.In contrast to the reference experiment, the model runs were only initialised once (incl.the 12 h spin-up).The weekly initialisation (WI) experiment is only forced at the lateral boundaries during integration time, while in the second experiment (WIN) weekly initialisation is combined with the analysis nudging option of the WRF, i.e., the WRF30-LD simulation is nudged towards GFS input data both horizontally and vertically using a point-bypoint relaxation term for temperature, pressure and specific humidity.
The results of all validation analyses applied to the WI and WIN sensitivity experiments are presented in Table 5 together with those of the reference experiment.The scores indicate the lower performance of the WI forcing strategy.This is also seen in Fig. 8 (right) where the daily scores strongly decrease over time of the simulation, and the region on the TiP affected by rain-and snowfall is not captured accurately when compared to the TRMM observations.The WRF as a limited area model is not capable of making   accurate predictions when the large-scale forcing is missing, thus WI without analysis nudging is not a suitable forcing strategy.
In contrast, the WIN experiment performs as good as the reference experiment as the scores in Table 5 and the results displayed in Fig. 8 indicate.Although there are some minor deficiencies at 2 km spatial resolution, these are compensated by some minor advantages at the lower resolution of 30 km (and partly also of 10 km).
The applicability of the WIN forcing strategy is thus depending on the specific purpose followed by a model experiment.If precipitation fields of high spatial resolution are a major objective then daily re-initialisation remains the better choice.Also, the flexibility of the daily re-initialisation with respect to parallelisation of model runs is better than in the WIN strategy.In the RE design, daily model runs are completely independent from each other, and there are only minor jumps between the results of two consecutive days, which may be more problematic in the WIN strategy (although this aspect was not analysed in this study).A further argument for the forcing strategy followed in the RE design is the fact that the WRF model is able to better utilise its predictive skills within the resolved scales, while analysis nudging strongly dampens the mesoscale processes resolved by the WRF model.

Sensitivity to the PPS
In this part of the sensitivity study eight experiments were carried out, applying two different schemes for any of the CU, MP, LS and PBL parameterisation schemes (Table 4).
The results of all validation analyses applied to the PPS sensitivity experiments are presented in Table 6 together with those of the reference experiment.While a comparison of the scores mainly serves to show the effects of the different PPS and to assess the overall performance, the spatial distributions of the differences between the PPS experiments and the reference experiment displayed in Fig. 9 provide insight into the mechanisms responsible for the differences.
Comparing the CU1 and CU2 experiments, the latter one has a lower performance, not only with respect to the CU1 experiment but also to the RE.The CU1 experiment improves the predictions with respect to the observations at weather stations while concurrently decreasing the predictions with respect to the TRMM observations.Figure 9 shows that the sensitivity to the CU1 and CU2 experiments is very strong in the Bay of Bengal and the high-mountain fringes of the TiP, but is low on the TiP.Generally, the CU1 and CU2 experiments are wetter than the RE, especially in these regions.The reason for this finding is the convective nature of precipitation in the sensitive regions (as will be discussed in Sect.4), while advection dominates large parts of the TiP.The three schemes work within different closure frameworks: for example, the PPS of the RE is a cloud ensemble scheme that uses 16 ensemble members to obtain an ensemble-mean realisation at a given time and location, while the other two schemes are triggered on various conditions on the vertical uplift within an atmospheric column.Our results are in accordance with Mukhopadhyay et al. (2009) who compared convective parameterisations for RCM simulations during the monsoon season, and found that the PPS used in the CU1 and CU2 experiments underestimate the observations for lighter rain rates, and overestimate for higher rain rates, while the PPS used in the RE shows an overestimation for lighter rain rates.The predictive skill of the CU1 and CU2 PPS in the LD with respect to TRMM observations is lower than that of the RE (not shown in Table 6), thus the New Grell-Devenyi 3 scheme used in the RE could be recommended.
The scores of the MP1 and MP2 experiments presented in Table 6 reveal that the PPS of the MP2 experiment is not suitable for the TiP.The MP1 results show similar effects in Fig. 9 as discussed for the CU1 experiment: high sensitivity of this PPS in the regions of convective precipitation and low sensitivity on most parts of the TiP where advection dominates.We argue that the choices of the CU and MP PPS should also consider combinatory effects, but are less influential for simulations in regions where advection is prevailing.These findings are interesting since the MP schemes are rather new and thus not extensively discussed in the scientific literature, so far.
The two experiments regarding the LS model underlying the WRF simulations show less pronounced effects when compared with the CU and MP PPS.The scores of the LS1 experiment reveal that larger differences to the RE are negative, thus this PPS would have to be rejected from these findings.However, there are also minor improvements in the simulation of snow processes, which may give reason for using this PPS when snow-hydrological investigations are in the focus of a model study.The LS2 experiment, in contrast, shows that no stronger negative effects are arising from this PPS, but unfortunately, it does not include an explicit parameterisation of snow processes, and thus makes it inappropriate for snow-hydrological investigations.Figure 9 shows that the effects of the two LS models are weak not only on the TiP but also in the regions where the CU and MP PPS show strong sensitivities.This can be attributed to the fact that convective processes and related MP processes are only weakly influenced by the underlying land surface, but mainly depend on the atmospheric dynamics and physics themselves.Over longer time periods, the choice of the LS model is expected to become more influential, but this aspect was not investigated in our study.
Finally, two different PPS for processes in the PBL have been analysed.The scores of the PBL1 and PBL2 experiments presented in Table 6 reveal that the PPS of the PBL2 experiment is not suitable for the TiP.The PBL1 experiment shows some improvement over the RE, but the skill for predicting snowfall seems to be less than the skill of the RE.As enlightened by Braun and Tao (2000), the sensitivity of  NWP models to the PBL parameterisation is also in this case stronger than to microphysics, at least for the regions south from the TiP. Figure 9 shows an interesting effect: the spatial pattern of the differences between the PBL2 experiment and the RE are strongly coupled to both CU PPS, since it also considers convective processes in the PBL.
In conclusion of the sensitivity studies for the PPS, we could show that there is nothing like a perfect combination of PPS, since any model-based investigation of precipitation on the TiP has to consider the oceanic regions where much of the water vapour and the convective systems influencing the southern, central and eastern parts of the TiP are formed.Yang and Tung (2003) also concluded that it was not possible to define a best performing cumulus parameterisation since each of the investigated cumulus schemes performed very differently for precipitation prediction under different synoptic forcing.Depending on the focus of an investigation, slightly modified PPS combinations may be used, but generally, the PPS combination used in the RE seems suitable for reanalysing precipitation on the TiP.

Discussion
In this section we will give answers to the two main research questions formulated in Sect.1.1.The first question is discussed using the results presented in Sect.3.1, while the discussion of the second question is based on the results shown in Sect.3.2 and additional analyses of the simulations of precipitation fields in October 2008.

Validation methods and data sets
In this study, we proposed three data sets (TRMM, MODIS, NCDC weather stations) and several statistical methods to assess the WRF model simulations.Several climatological studies used the TRMM products, and it has been proven that TRMM 3B42 surface-rainfall rate is comparable to other surface observations (Koo et al., 2009), although the spatial scale of the rainfall data makes direct comparison to gauge data difficult.Our results also enlighten this issue: the WRF model and precipitation data sets depicted some considerable discrepancies when compared to point-by-point measurements at weather stations.TRMM showed to be less efficient than WRF10.Estimation errors due to spatial resolution may be reduced by statistical correction methods as described by Yin et al. (2008), but the authors also remind that TRMM performs poorly during the winter months, because of the presence of snow and ice over the TiP, since snow and ice on the ground scatter microwave energy in a similar fashion as ice crystals and raindrops in the atmosphere.
At the same time, TRMM offers a way to assess the mesoscale WRF30 output on a gridded spatial basis, and the two data sets are in good accordance for the spatial delineation of the 10 mm week −1 precipitation event.The comparison also showed differences in the occurrence of extreme events, for which WRF30 predicts higher amounts than TRMM observes.India's weather stations in the north-eastern provinces recorded precipitation amounts up to 150 mm day −1 during the event (IMD, 2008), which is lower than the highest values predicted by WRF (up to 500 mm week −1 for a few points at the high-mountain fringe of the TiP) but higher than TRMM estimations (maximum values of 130 mm week −1 ).
One of the strengths of the WRF model is its ability to separate snow from rain.Since we are targeting on using the regional atmospheric reanalysis for hydrological and glaciological applications, snow data are of great importance, and the MODIS test that we developed proved to give valuable information on the models capacity to retrieve snowfall at high spatial resolution, as e.g.indicated by the positive effect of increasing spatial resolution and the use of two-way nesting.However, snow extent could be detected only where no snow prior to the event or clouds was present, preventing us drawing conclusions on high-mountain snowfall.Furthermore, it is rather difficult to find a suitable event or to run this test on a regular and automatic basis.
So far, no robust assessment of the WRF2 output is possible with weather stations either, as they are too scarce.Moreover, rain gauges do have a high sensitivity to wind and to the size of snow/rain particles: the snow under-catch of the four most widely used gauges can vary up to 80 % (Goodison et al., 1998).On response to this problem, we installed a Laser precipitation monitor (Distrometer) at the Nam Co station for future validation studies.
We applied a combination of different validation methods since none of the observational data sets could be unambiguously identified to serve as an absolute reference for model validation.Despite the discussed differences between the model experiments and the observational data sets (including differences between the observational data sets themselves) the model predictions and the observational data are generally concordant.Thus, we consider the validation methods and data sets principally suitable for this kind of studies.

Reanalysis of precipitation fields over the TiP
The WRF model offers a countless number of configurations, and in this study there was no intent to realise a complete review of the various possibilities.We assessed the sensitivity of the WRF model to different PPS.In general, the influence on predicted precipitation was rather small on the TiP in comparison to the forcing strategies.This can be due to frequent re-initialisation constraining the model to stay close to the large scale observations given as input data.
The comparison with observational data sets showed that the WRF model had a good accuracy in predicting snow-and rainfall of a single precipitation event.Tables 5 and 6 also present the scores for the one-month simulations carried out for October 2008, which are showing that the WRF model is generally able not only reanalysing precipitation over longer time periods including also times of no precipitation.
Figure 10 presents two results illustrating both the reason for the applicability of our approach and the hydrological relevance of individual precipitation events.The upper two maps display the contribution of convective precipitation to the total precipitation for the one-week simulation period of the Rashmi event.Except for the central and north-western part of the TiP, advection prevails and enables to retrieve accurate precipitation values within the error limits of the observations.In regions of high contribution of convective precipitation the sensitivity of the two CU and MP PPS, as well as the PBL2 PPS are much higher than in the other regions, which explains why the PPS schemes are not as important on the TiP.The dominance of advection on the TiP also allows reanalysing precipitation fields at high spatial resolution of 2 km without using a CU parameterisation scheme that is required at the lower resolutions of 10 and 30 km.
Figure 10 also shows the hydrological relevance of the Rashmi event, and reveals the complex spatial pattern of the precipitation fields on the TiP.High contributions of precipitation caused by the Rashmi cyclone to monthly precipitation in October 2008 reaches almost 100 % in some areas in the south and south-east of the TiP, while the areas of highest precipitation amounts during the event show contributions to monthly precipitation far beyond on third.This is in line with the findings shown in Fig. 2 where the importance of individual precipitation events was shown for annual precipitation in 2008 as observed at the 19 weather stations used in this study.resolution.Bookhagen and Strecker (2008) analysed orographic precipitation along the eastern Andes, showing that only TRMM 2B31 data, which has a high spatial resolution of about 5 km is able to depict the small-scale orographic influence on precipitation.However, these data are not ap-plicable to single events due to low temporal sampling rates (about one acquisition per day).Our findings are generally consistent with the results from other studies (e.g., Caldwell et al., 2009;Bromwich et al., 2005).

Conclusion
Our study reveals that there is nothing like an optimal model strategy applicable for the high-altitude TiP, its fringing highmountain areas of extremely complex topography and the low-altitude land and sea regions from which much of the precipitation on the TiP is originating.The choice of the physical parameterisation scheme will thus be always a compromise depending on the specific purpose of a model simulation.Many other model configurations could have been proposed and tested, but the comparatively small sensitivity to the physical parameterisation schemes on the TiP suggests that further investigation should also focus on the models response to the forcing data sets providing the initial and boundary conditions.
In this study we do not apply bias correction for precipitation, thus keeping the model results physically consistent, and the errors caused by the different elements of the ex-perimental design traceable.Bias correction is only valid if observational data are accurate, which is not the case for current observation methods available on the TiP.New remote sensing approaches will probably improve the situation as e.g.Bookhagen and Strecker (2008) could demonstrate for orographic precipitation detected by the TRMM 2B31 data set.However, the same authors also discussed the limitations of this data set, in particular the low sampling frequency of about one overpass per day.
Our study demonstrates the high importance of orographic precipitation, which is well captured as assessed on a qualitative basis.However, the problem of the orographic bias remains unsolved since reliable observational data are still missing.
The results presented in this paper are relevant for anyone interested in carrying out a regional atmospheric reanalysis employing a NWP model.The benefits of NWP models over usual gridded precipitation products are obvious: e.g.high resolution in time and space, flexibility and reproducibility.
The WRF model showed good accuracy in simulating snowand rainfall on the TiP for a one-month simulation period.Our results encourage further investigations over longer simulation periods, and should also include further atmospheric variables.
Many hydrological analyses and applications like rainfallrunoff modelling or the analysis of flood events require not only precipitation accumulated over weeks and months but also precipitation rates at daily or even hourly intervals.Thus, our study offers a process-oriented alternative for retrieving precipitation fields of high spatio-temporal resolution in regions like the TiP, where other data sources are limited.
With the exception of the ground observations at the Nam Co research station, all data sets and the WRF model used in this study are available free of charge to the scientific community.This recent melioration in data accessibility represents a great improvement and will offer many possibilities for future atmospheric, hydrological and glaciological research.

Fig. 1 .Fig. 1 .
Fig. 1.WRF domains and model topography.Left: large domain (LD); the medium (MD) and small (SD) are indicated by black frames.Center: medium domain.Right: small domain.Black dots represent the locations of the available weather stations.

FFig. 2 .Fig. 2 .
Fig. 2. Contribution (%) of accumulated daily precipitation to the annual precipitation in the year 2008 for the 19 available weather stations.The daily precipitation values are sorted in decreasing order.The number of days in which 50 % of the annual precipitation is reached are indicated for the two extremal stations.

Fig. 4 .
Fig. 4. Daily precipitation fields (mm/day) from (T) TRMM and (W) WRF30 over a subset of the LD for the period 22 to 28 October 2008.

Fig. 4 .
Fig. 4. Daily precipitation fields (mm/day) from (T) TRMM and (W) WRF30 over a subset of the LD for the period 22 to 28 October 2008.

FFig. 5 .Fig. 5 .Fig. 6 .
Fig. 5. Left: Spatial patterns of the contingency tables for three different thresholds (10, 20 and 60 mm/week) applied to the one-week precipitation of WRF30-MD and TRMM, together with values from ground observations at the weather stations.Right: WRF30-MD scores with respect to TRMM for the threshold range from 1 to 100 mm/week.

Fig. 6 .
Fig. 6.Left: HSS curves for snow depth predicted by WRF simulations for thresholds between 0.2 and 20 cm with respect to snow extent observed by MODIS.Right: spatial patterns of the contingency tables for the snow depth threshold of 7.2 cm for WRF10-MD and 2 cm WRF10-SD, WRF30-SD and WRF2-SD.

Fig. 7 .
Fig. 7. Sensitivity to the nesting strategy: HSS curves for snow depth predicted by WRF simulations (TW, OW and RE) at all resolutions for thresholds between 0.2 and 20 cm, with respect to snow extent observed by MODIS on the SD.

Fig. 8 .Fig. 8 .
Fig. 8. Sensitivity to the forcing strategy.Left: one-week precipitation (mm/week) for TRMM, WRF30-RE, WRF30-WI and WRF30-WIN over a subset of the LD.Right: daily HSS curves from WRF30 with respect to TRMM for the 1 mm/day precipitation threshold over the same subset.

Fig. 9 .Fig. 9 .
Fig. 9. Sensitivity to the physical parameterisation schemes.Difference between the one-week precipitation (mm/week) of the sensitivity experiments and the RE for WRF30 over a subset of the LD.

Fig. 11 .
Fig. 11.Time series of daily precipitation amounts accumulated during October 2008 observed at weather stations, by TRMM and predicted by the one-month WRF10 simulation.Explained variances (r 2 ) are given for each pair of the three data sets and each station, as well as the mean r 2 for all stations.X-axis values are the days in October 2008.

Fig. 11 .
Fig. 11.Time series of daily precipitation amounts accumulated during October 2008 observed at weather stations, by TRMM and predicted by the one-month WRF10 simulation.Explained variances (r 2 ) are given for each pair of the three data sets and each station, as well as the mean r 2 for all stations.X-axis values are the days in October, the grey band marks the one-week focus period 22-28 October 2008.

FFig. 12 .Fig. 12 .
Fig. 12. Spatial distributions of precipitation amounts in October 2008 observed by TRMM and predicted by the one-month WRF30, WRF10 and WRF2 simulations on the SD.

Table 1 .
Design of the reference experiment (RE).

Table 2 .
Contingency table used in the validation and sensitivity studies.

Table 3 .
Observed precipitation ( mm week −1 ) at each weather station (19 NCDC stations and the Nam Co weather station) in comparison to the TRMM, WRF30, WRF10 and WRF2 results.The observed type of precipitation is indicated by R (rain) or S (snow).Root Mean Square Deviation (RMSD) and Mean Bias (MB) are indicated for each grid with respect to the NCDC data.

Table 4 .
Design of the sensitivity experiments.

Table 5 .
Scores used for statistical evaluation of the reference, nesting and forcing experiments.The scores that differ of more than 10 % from the RE are marked in bold, and put between parentheses if they have a lower skill than the RE.

Table 6 .
Scores used for statistical evaluation of the reference and the eight PPS experiments.The scores that differ of more than 10 % from the RE are marked in bold, and put between parentheses if they have a lower skill than the RE.