Rainfall and temperature estimation for a data sparse region

Introduction Conclusions References


Introduction
Humanitarian and development agencies face difficult decisions about where and how to prioritise climate risk reduction measures.For example, the Pilot Program for Climate Resilience (PPCR) of the Strategic Climate Fund (SCF) aims to demonstrate ways in which climate risk and resilience may be integrated into core development planning1 .This raises practical questions about where infrastructure should be upgraded to better cope with extreme weather events.Which areas should be prioritised for soil and water conservation measures?Which agricultural regions would benefit most from improved water access or from rehabilitation of traditional rain-water harvesting systems?Where should training of agricultural extension workers be concentrated?What is the future climate suitability of lands for different crops?High-quality, spatially distributed hydro-meteorological information (as well as regional climate projections) can help appraise such options.
Many of the regions that are highly vulnerable to climate risks are also amongst the most data sparse.Yemen is one of the least developed countries in the world and one of the poorest in the Middle East and northern Africa.Agriculture employs more than half of the labour force and accounts for over 90% of water use.However, long-term, homogeneous meteorological records are scarce.Although monthly data are available for Aden since the 1880s, daily records typically begin in the 1970s or later, and many were tied to shortterm projects, or are of suspect quality.This lack of data severely hampers efforts to evaluate short-term meteorological hazards and long-term climate risks.Furthermore, without high-resolution precipitation and temperature records, it is hard to benchmark future climate variability and change, associated impacts, and adaptation outcomes.
Several ways have been devised for addressing this information deficit.For instance, it has long been recognised that the distribution and amount of precipitation in Yemen varies with altitude, latitude and distance from the coast (Beskok, 1971).Early isohyetal (precipitation) maps made extensive use of natural vegetation surveys as a proxy for precipitation gradients (e.g.Atkins et al., 1984).With the advent of remote sensing in the 1980s came prospects for gathering hydrological indices in data sparse regions (Grolier et al., 1984) Published by Copernicus Publications on behalf of the European Geosciences Union.
and prediction for ungauged basins (Lakshmi, 2004).Others developed geostatistical methods (Goovaerts, 2000), spatial interpolation techniques (Guenni and Hutchinson, 1998), global gridded climate data sets (Harris et al., 2013), and stochastic weather models for semi-arid and arid environments (Hutchinson, 1995).Both sets of approaches ultimately depend on the physical attributes of the landscape and climate, as well as access to in situ measurements for model calibration and ground truthing (Almazroui, 2011).One global assessment further acknowledged that large uncertainties in satellite-based precipitation estimates can occur over complex terrain and near coastlines (Tian and Peters-Lidard, 2010).Other work shows higher than expected rainfall estimates over sand desert areas (Habib and Narsollahi, 2009).
In the present study, we blend surface meteorological observations, remotely sensed (precipitation and vegetation) indices, topographic information, and regression techniques to produce gridded maps of annual mean precipitation and temperature, as well as parameters for local daily weather simulation in Yemen.These tools were developed during the course of a national assessment of flash flood risk, soil erosion, water harvesting, and cropping potential (Wilby and Yu, 2013).Our approach is based on the fundamental premise that publicly available land-cover and landscape attributes can be used to estimate local mean climate conditions for data sparse regions.Yemen provides a particularly challenging case study for testing these ideas.
Section 2 describes the study area and data sets used for statistical model calibration and validation.Section 3 explains the methods of parameter estimation and Sect. 4 presents validation results and maps generated by statistical models.The application of inferred weather generator parameters for daily precipitation and temperature simulation is also demonstrated for a site in the Yemen Highlands.Section 5 discusses the possible method refinement and extension to climate risk assessment, and Sect.6 outlines future research opportunities, including extension of the approach to other data sparse regions.

Study area and data
With a human development index (HDI) of 0.462, Yemen is ranked 154th out of 169 countries (HDR, 2011).Over 60 % of the rural population currently live beneath the national poverty line of USD 2 day −1 and Yemen's renewable freshwater resource (135 m 3 per capita per year in 2009) is one of the lowest in the world.Over 50 % of children are chronically malnourished, and agriculture is a key source of income for nearly three quarters of the population.There are concerns that climate variability and change could further exacerbate poverty and food insecurity, particularly for the rural population (World Bank, 2010).Hence, development of climate information systems is seen as a priority for evaluating and managing climate risks to natural resources and livelihoods (RMSI, 2013).
Yemen is characterised by five major eco-climatic zones (EPA, 2004:7): (1) the hot and humid coastal Tihama plain, 30-60 km wide, along the Red Sea and the Gulf of Aden, (2) the Yemen Highlands, a volcanic region with elevations between 1000 and 3600 m parallel to the Red Sea coast, with temperate climate and monsoon rains, (3) the dissected region of Yemen's high plateaus and the Hadramawt-Mahra Uplands, with altitudes up to 1000 m, (4) the Rub al-Khali desert interior, with a hot and dry climate, and (5) the islands, including Socotra in the Arabian Sea and more than 112 islands in the Red Sea.
Rainfall varies from less than 50 mm along the coastal plains and desert plateau regions to more than 1000 mm in the western mountainous highland region.Two main rainfall patterns are evident: (1) a southwest, summer monsoon regime with rainfall occurring in two periods, March-May and July-September, as a result of the Red Sea Convergence and the Inter Tropical Convergence Zone (ITCZ) (EPA, 2004:8); (2) a winter maximum, cyclonic regime driven by frontal rainfall from the north accounting for up to 80 % of annual totals on the coast.Altitude exerts a very strong control over the pattern and quantity of rainfall.Likewise, temperature depends on elevation and distance from the coast.Mean annual temperatures range from less than 12 • C in the highlands (with occasional frost) to 30 • C in the coastal plains (EPA, 2004:8).
Meteorological data are not held by a central authority in Yemen, but dispersed across several agencies including the Yemen Meteorological Service (YMS) within the Civil Aviation and Meteorological Authority (CAMA), the Ministry of Agriculture and Irrigation (MAI), Agriculture Research and Extension Authority (AREA), National Water Resource Authority (NWRA), and the Tihama Development Authority (TDA).Long-term systematic observations of precipitation and temperature are very rare indeed.Early, short-lived records are available for Aden, Sana'a and Taiz, but there are only a handful of series that extend from the 1970s to present.Furthermore, quality concerns surround many daily and monthly meteorological records; some data have yet to be digitised, or have been lost; untold records have been broken by episodes of civil unrest and conflict (Fig. 1).There are also inconsistencies in station details (such as naming convention, latitude, longitude and elevation) between digital archives and annual reporting sheets.

Precipitation
With the above points in mind, daily meteorological data were compiled from several sources during World Bank missions to Yemen in 2008 and 2009 (Wilby, 2008(Wilby, , 2009a, b), b).Two sets of precipitation data were formed: Network A contains 62 stations from the NWRA archive for 1998-2006 (Fig. 2a (Fig. 2a and NWRA archive for 2007 (Fig. 2b).Spatial coverage was maximised by using any record with at least three years of data within the period 1998-2006, or 330 1) including: mean annual fraction of days that are wet (defined as any day with non-zero precipitation amount) (PWET); mean annual rainfall total (RTOT); annual maximum daily total with return period of 10 yr (RMAX10); mean and standard deviations of daily rainfall totals after the fourth root transformation (R4MEAN and R4SD respectively); alpha and beta parameters of the gamma distribution for daily wet-day amounts (ALPHA and BETA respectively).In addition, the monthly unconditional wet-day probability (PWETm) was derived for all sites in Network A. This parameter captures local variations in the onset and strength of the summer monsoon.

Temperature
Temperature data were compiled from daily (NWRA) and monthly (AREA) records for various periods in the 1970s, 1980s and 1990s (Wilby, 2009a).These data sets were merged to maximise spatial and temporal coverage in Network C which contains 60 stations in total (Fig. 2c).Again, note the very sparse network east of 45 • E, and complete lack of sites beyond 50 • E. Only 10 sites have daily temperature data and all are located in the southwest.
Four temperature parameters were derived for all stations: the mean annual temperature (ATBAR); the mean annual maximum (ATMAX) and minimum (ATMIN) temperatures; and mean annual temperature range (ATRNG).Five other parameters were derived from the daily temperature records: mean Julian day with highest annual temperature (ADMAX); standard deviation of the date recording the highest annual temperature (ADMAXSD); mean diurnal temperature range (DTRNG); standard deviation of daily mean temperature (DTSD); and lag-1 autocorrelation coefficient of daily mean temperatures (DTLAG1).The purpose of the ADMAX and ADMAXSD parameters is to stochastically shift the day of peak temperature (and thereby the annual sine-curve) within a window of time.

Remotely sensed and terrain data
Gridded precipitation estimates were obtained from the Tropical Rainfall Measuring Mission (TRMM version 7) multisatellite precipitation analysis which archives 3-hourly, daily and monthly precipitation totals at 0.25 • × 0.25 • latitudelongitude resolution, in near real time since 1998 (Huffman et al., 2007).Monthly and annual precipitation totals were extracted for all grid points in a domain covering the whole of Yemen for two periods : 1998: -2006: and 2007 (Fig. 3) (Fig. 3).The Normalized Difference Vegetation Index (NDVI) provides a record of the spatial and temporal dynamics of green plant canopies using remotely sensed near-infrared and red colour bands on a pixel by pixel basis (Fig. 4a).Monthly NDVI was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Terra via the USGS Earth Resource Observation and Science Center (EROS)2 .

Methodology
The methodology is described in three stages: (i) estimation of weather generator parameters for precipitation using remotely sensed and terrain attributes; (ii) the same process for air temperature, and; (iii) use of these empirical relationships to obtain parameters for a single-site daily weather generator (mean temperature and precipitation totals), for an illustrative settlement (Taiz) in the Yemen Highlands.

Rainfall parameter estimation
Rainfall parameters for Yemen were derived via multiple linear-regression relationships with satellite-derived precipitation (TRMM_RTOT), terrain elevation (DEM_ELEV), and monthly Normalized Difference Vegetation Index (NDVI_month) (Table 1).For example, Fig. 5 shows the relationship between observed mean annual rainfall totals for sites in Network A and the nearest grid-point estimate from TRMM(V7).The model for PWETm was built using dummy variables (Dm) for each month, with TRMM(V7)_RTOT and ELEV as predictors.All regression models were constructed by pooling data in order to maximise the information for parameter estimation.Sub-annual modelling is an appealing idea but in practice is constrained by the very small number of rain days in the training set.This is particularly challenging when estimating wet-day amount distributions.
Three simplifying assumptions were made when building the regression models: (i) relationships between the dependent (point rainfall parameters) and independent (gridded) variables are linear; (ii) statistical relationships are stationary (i.e.do not vary in space or time); and (iii) local rainfall parameters behave independently of each other.Assumption (i) was tested through visual inspection of scatterplots showing pairs of dependent-independent variables.Multiple linear regression models were calibrated using rainfall parameters from Network A (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006) and validated against Network B (2007).This was considered a stringent test of model stationarity (assumption ii) since the test data contain periods of record and sites not used in training.Any covariance between the parameters (assumption iii) was represented implicitly through use of common independent variables.
Exploratory analyses were performed to assess predictability of daily rainfall parameters from terrain and remotely sensed metrics using dependent and independent variables for the period 1998-2006 (Fig. 6).The strongest associations were found between the annual unconditional probability of rainfall (PWET) and annual total rainfall (RTOT) inferred from TRMM(V7)_RTOT and Comparison of TRMM(V7) with observed mean annual rainfall at Network A sites.summer NDVI.Previous studies also report positive correlations between the observed seasonal cycle of rainfall totals and TRMM estimates for gauges in neighbouring Saudi Arabia (Almazroui, 2011).Moreover, the strong correlation with vegetation greenness (NDVI) is physically meaningful since August has on average the most frequent rain-days and is the height of the summer monsoon (kharif rains) in the Yemen Highlands and along the Red Sea coast (Tihama Plain).
Much weaker correlations were found for parameters of daily precipitation such as PWETRNG and R4MEAN.Again, this is consistent with previous work showing weak negative correlations between R4MEAN versus longitude and distance from the coast (Wilby, 2009b).This reflects general declines in precipitation eastwards of the Yemen Highlands, and northwards in the Rub al-Khali (empty quarter).
Annual maximum daily rainfall totals with return periods of 10 yr (RMAX10) were estimated from the local parameters describing the logarithmic (RTOT/PWET), gamma (AL-PHA, BETA), and fourth root (R4MEAN, R4SD) distributions.Daily rainfall totals were not derived for events rarer than once per 10 yr because of the limited amount of data (only 9 yr) available for parameter estimation, and low confidence in some of the reported daily totals.(Note that sitespecific, extreme values derived from gamma or exponential models can be compared with values from a weather generator run for N yr (see below)).

Temperature parameter estimation
Annual and daily temperature parameters for Yemen were estimated from multiple linear regression relationships with elevation (DEM_ELEV), latitude (LAT) and longitude (LON) (Table 2).For example, Fig. 7 shows a very strong negative correlation between mean annual temperature (ATBAR) and site elevation.The next strongest relationship is between variability in the date of the warmest day each year (AMAXSD) and longitude; whereas variance in daily temperatures (DTSD) depends more on latitude (not shown).
In comparison, the north-south gradient in mean annual temperature is much weaker.These analyses suggest that some important features of the annual and daily temperature regime can be inferred from simple landscape indices.

Mean annual climate and daily weather generation
Estimated rainfall and temperature parameters were applied in two ways.First, the parameters were used to construct high-resolution (1 km) maps of mean annual rainfall totals and air temperature for the whole of Yemen.Surfaces were produced with and without interpolation and, where feasible, compared to data not used for model calibration.Second, the parameters were used in a conditional weather generator to simulate daily rainfall and temperature at Taiz in the Yemen Highlands.This site was chosen because the amount and quality of data available are relatively good.Furthermore, the Taiz meteorological record has been used before to construct a simple weather generator for evaluating rain-fed agriculture and water harvesting potential (Rappold, 2005).
In view of the data constraints only the most straightforward of weather generator algorithms is warranted (Wilks and Wilby, 1999).The Yemen weather generator produces daily rainfall and mean temperature series via the following algorithm.

Determine the local unconditional wet-day probability
for the current month using the regression equation for PWETm (Table 3).
2. Generate a linear random number (R1) to determine whether the day is wet or dry (i.e. a wet day when R1≤PWETm).3. If dry proceed to step (vi), otherwise use the regression equations to estimate local R4MEAN and R4SD (Table 3).
4. Generate a normally distributed random number (R2) to determine the transformed wet-day amount (RWET) given R4MEAN and R4SD.
6. Determine the Julian day and if greater than the year length, reset the day, month and year counter.Then stochastically generate the date of the warmest day in the new year using a normally distributed random number (R3) with mean ADMAX and standard deviation ADMAXSD (Table 3).
7. Determine the local daily mean temperature (TBAR) given the Julian day by assuming that the annual temperature regime follows a sine curve with mean (ATBAR), amplitude (ATRNG), and phase (AD-MAX ± ADMAXSD) (Table 3).
8. Inflate the variance of TBAR by adding a stochastically generated anomaly given a normally distributed random number (R4) and the NOISE parameter (using the regression equation for DTLAG1 in Table 3).
In order to assess the transferability of the precipitation generator, parameters were estimated using regression equations that had been calibrated with and without data for Taiz, and by using models built on different sub-sets of Network A data to explore the consequences of stratifying by climatic zone.Accordingly, weather generator results for Taiz were produced from observed precipitation parameters (WGOBS), parameters regressed from all sites (WGALL), and parameters regressed only from sites on the western escarpment of the Yemen Highlands (WGWH).

Mean annual precipitation
Table 3 provides a summary of regression model coefficients, independent variables used, and the amount of explained variance (adjusted for sample size, R 2 adj ).Overall, parameters of daily rainfall occurrence and amount (PWET, PWETRNG and RTOT) are fit better than parameters describing the distribution of daily rainfall amounts (R4MEAN, ALPHA and BETA).In the case of the former, over 85 % of the spatial variation is explained by landscape and land-cover properties.
Regression models were validated using rainfall parameters from Network B (Fig. 8).As with calibration there is higher skill for daily rainfall occurrence (PWET) than amounts (R4MEAN) yielding 45 % explained variance for their product -the annual rainfall total (RTOT).Given the severity of the validation test and low confidence in the data quality this modest level of skill was encouraging.Figures 9-12 show maps of observed and modelled PWET and RTOT for the calibration and validation networks.Interpolated point estimates (panel b in each case) tend to reduce both the detail and variance compared with raw model estimates (panel c).Overall, the models underestimated rainfall occurrence but overestimated totals in the validation year 2007.We recognise that validation against a single year is less than ideal but, as we note above, the use of a different network was a stringent test.Furthermore, the relative aridity of 2007 when compared with 1998-2006 makes the test even more demanding (Fig. 3).
Models based on TRMM and NDVI fit the calibration data better than models based on TRMM alone (not shown).Interpolation reduces model errors relative to observations in both the calibration and validation periods.This is not surprising    given that the observations are themselves interpolated to enable grid-to-grid comparison.However, the gain from interpolation is less for the validation period which was drier than the period used for calibration (see Fig. 3).Overall, the model based on TRMM and NDVI is preferred because the detailed spatial structure of rainfall is better preserved; the model performance is stable between calibration and validation periods; and the marginal benefit of simpler model variants can be explained by the interpolated (rather than gridded) observations used for error analysis.Furthermore, the range of local values is compressed by interpolation, potentially understating extreme daily rainfall amounts (see below).Root mean squared errors (RMSEs) for annual mean precipitation without interpolation are 106 mm for Network A and 104 mm for Network B. With interpolation, RMSEs for Network A and B are 72 mm and 91 mm respectively.These errors are equivalent to the annual mean rainfall total estimated for the arid interior (Figs. 11,12).Nonetheless, the characteristic "heel" pattern of annual precipitation emerges, which is qualitatively consistent with earlier rainfall maps for Yemen (e.g.Farquharson et al., 1996).A strong precipitation gradient from the highlands to the Red Sea coast and local maxima along the line of east Taiz, Ibb, west Dhamar and Raymah governorates is also evident.
Observed and modelled extreme rainfall totals were evaluated using the daily total with 10 yr return period as a diagnostic (Fig. 13).Given limited data for model calibration, and estimation of an extreme statistic of greater return period than the length of record, this is a severe test.The three models (gamma, fourth root and logarithmic) show comparable skill, but there is uncertainty in the quality of the observations.Each model produces a hypothetical distribution based on their respective parameters with varying degrees of match to the empirical distribution.The gamma model is preferred because it generates marginally larger values than observations.This means that if the output is used to assess risks associated with heavy rainfall (such as soil erosion or flash flooding) the magnitude of the impact will be similarly uplifted.Any adaptations to these design events would, therefore, be inherently precautionary.
A north-south zone of high maximum daily rainfall totals is predicted along the escarpment above the Tihama coastal plain (Fig. 14).This finding is consistent with anecdotal evidence suggesting that "these wadis can carry enormous floods from the western mountain slopes to the Red Sea" (Shahin, 2007: p. 280).Further circumstantial evidence is provided by the inventory of notable floods held by the Table 3 provides a summary of independent variables, regression model coefficients, and the amount of explained variance.Overall, the best model fits were obtained for AT-BAR (R 2 adj = 89 %) and ADMAXSD (R 2 adj = 50 %).Conversely, ADMAX had no predictive skill based on the chosen terrain indices and is best described by the long-term mean.
Due to data constraints ATBAR was assessed using a cross-validation technique in which 90 % of data are used to predict the remaining 10 %.The data held for model testing were swapped ten times until a complete set of independent predictions had been made.Figure 15 shows observed and predicted ATBAR based on this cross-validation, and that the regression model based on ELEV explains 89 % of the variance.The modelled lapse rate is 0.53 • C per 100 m and the spatial pattern of annual mean temperatures are comparable with those reported elsewhere (Bruggeman, 1997).The highest mean annual temperatures (∼ 30 • C) are predicted at sea level along the coastal plains beside the Red Sea and Gulf of Aden (Fig. 16).The average daily temperature range (DRNG) is about 15 • C in coastal areas but more than 20 • C at some high elevation sites in the arid interior.

Daily weather generation
The weather generator was first run using observed (WGOBS) and estimated (WGALL) temperature parameters for Taiz.WGALL parameters were derived from the regression equations in Table 3 given site ELEV, LAT and LON.Both parameter sets were representative of years 1998 to 2005.
As would be expected, WGOBS parameters reproduce the mean and distribution of daily temperatures (Table 4, Fig. 17).Overall, the annual range, standard deviation and serial correlation (Lag1) of daily temperatures are slightly underestimated (although the observed maximum daily temperature of 32.7 • C on 24 September 2001 is regarded as suspect given the timing as well as much lower values on days before and after).The standard deviation of annual means is also slightly lower than observed but low-frequency variability is not explicitly represented by the model.
WGALL parameters yield poorer overall performance than WGOBS (Table 4).However, this reflects the combined effect of model and parameter uncertainty for the site.This run has a cool bias of 1.0 • C in mean daily temperature (Fig. 17), and weaker serial correlation.On the other hand, the standard deviation of annual means matches observations and the minimum temperature is better approximated.The cool bias translates into lower than expected annual degree day totals with the timing of the annual maximum delayed relative to observations (Fig. 18).This is because the mean Julian date of the warmest day is determined by the network average rather than for the site (Table 3).Although it is known that the warmest spell at Taiz typically occurs earlier than all other sites in Network C, this parameter cannot be predicted with any skill from available terrain indices.
The weather generator was also run for Taiz using precipitation parameters derived directly from observations (WGOBS), estimation via regression models calibrated with all sites in Network A (WGALL), and those calibrated using data only from the western highlands of Yemen (WGWH) for the years 1998-2005.Overall, the WGOBS runs reproduce most diagnostics except for the annual precipitation total which is overestimated by ∼ 10 % and the serial correlation (Lag1) of wet spells which is too low (Table 4).The WGALL parameters underestimate mean wet-day amount, frequency of wet days, serial correlation and thereby mean annual total.WGALL standard deviations of daily and annual precipitation totals are less than observations; the overall distribution is skewed towards lighter rainfall events when compared with OBS and (Fig. 17).
The weather generator based on regionally specific data (WGWH) also underestimates precipitation amounts and variability at daily and annual scales, although the seasonal timing of rainfall occurrence and wet season (May-October) accumulations are reproduced with greater skill than by WGALL (Fig. 19).This suggests that the WGWH model may have some utility for evaluating crop potential and water harvesting schemes (as in Rappold, 2005).Annual totals are underestimated because WGWH parameters yield too few winter (November-April) wet days compared with OBS.

Discussion
We have explained the context, modelling techniques and indicative skill for local rainfall and temperature estimation in Yemen.Although the idea of interpolating meteorological data is not new, by definition it is seldom attempted for extensive, data sparse regions with complex terrain.Previous studies have interpolated climate records (Grant et al., 2004), daily meteorological data (Hutchinson et al., 2009), or extreme values (Pereira et al., 2010) but these methods are potentially susceptible to patchy or discontinuous data affecting aggregate statistics.Others simulate multi-site rainfall and temperature using resampling (Buishand and Brandsma,   2001) or stochastic weather generator techniques (Wilks, 1998;Burton et al., 2010) but these are typically employed at river basin scales.Some have applied weather generator techniques in complex, mountainous terrain by conditioning the model parameters on landscape features such as site elevation (Daly et al., 1994(Daly et al., , 2008;;Johnson et al., 2000;Wilks, 1998Wilks, , 1999)).Others interpolate weather generator parameters from sites to grids and synchronise with observed weather to create spatially and temporally consistent multi-site behaviour (Wilks, 2009).Parameters may also be conditioned in time using slowly varying climate indices to replicate low-frequency persistence (Wilby et al., 2002), or in space using predefined climatic zones to condition spatial variability across large (tropical) river basins (e.g.Kigobe et al., 2011).We extended  these approaches by interpolating weather generator parameters conditional on point terrain and remotely sensed indices (precipitation and vegetation).The sophistication and realism of our statistical models were commensurate with the quality of available data.Indeed, even the production of mean annual rainfall and temperature isopleth maps was non-trivial and potentially an important legacy of the research.These surfaces require further validation but may provide a valuable benchmark for subsequent climate risk assessments and options appraisal.Our methodology could be refined in several other ways.First, a wider range of topographic indices could be extracted from the DEM for use as independent variables in the regression modelling.For example, other landscape indices employed within PRISM include distance from coast, topographic facet orientation, vertical atmospheric layer, topographic position, and orographic effectiveness of the terrain (Daly et al., 2008).A finer resolution DEM would also reduce the misrepresentation of site elevation at/close to sea level or in very steep terrain.Alternatively, meteorological data could be stratified according to the five recognised eco-climatic zones of Yemen (EPA, 2004): hot, humid coastal plains; temperate monsoonal highlands; the high plateau; hot, dry desert interior; and islands.However, there are very few surface weather observations for the last three landscape units.Second, more sophisticated statistical models could be developed.For instance, non-linear techniques such as logistic regression might be applied to better represent abrupt changes in precipitation gradients, as between the coastal plains and highlands.The number of multiple independent variables could be reduced via factor, discriminant or principle component analysis.Possible conditioning of temperatures by precipitation occurrence could also be explored using the handful of sites where both data are available.Serial persistence could be explicitly represented using conditional wet-day probabilities or strengthened in temperature series by using longer lag-intervals.More exhaustive crossvalidation could be performed to produce distributions and confidence intervals for local regression parameters.This would have the added advantage of highlighting those areas with greatest model uncertainty, and hence potential foci for network expansion.
Third, related to the above point, model parameter and predictive uncertainties could be explored more thoroughly.All variables listed in Tables 1 and 2 are uncertain to varying degrees at the site scale.This uncertainty could be explored through jackknife estimation of the sampling variance of parameter estimates based on available observations (as in Jones and Kay, 2007).Through Monte Carlo techniques it would then be possible to run the weather generator with multiple sets of parameters and thereby produce uncertainty bounds for simulated temperature and precipitation series at individual sites.There is also generalisation uncertainty in each of the regression equations reported in Table 3. Overall, this is reflected by the standard error of the model estimates.However, as suggested above, this aspect of uncertainty could be explored further by pooling calibration data using site-or climate-similarity indices.Results shown in Table 4 suggest that regionalisation may reduce errors for some (but not all) model diagnostics.Another benefit of regionalization is that parameters from sites with meteorological data can be applied to climatologically similar locations where there are no data.This may be preferable to using uncertain parameter estimates inferred from regression relationships (Kokkonen et al., 2003).
Fourth, data for model calibration and validation could be enhanced.Recent assessments have highlighted the potential for data recovery and digitisation of meteorological data from paper records, or assembly of a national archive from the various agency holdings and networks (RMSI, 2013).This is most likely a long-term endeavour.In the meantime, meteorological data from sites in Oman and Saudi Arabia could be incorporated in the regression modelling and thereby reduce uncertainty in interpolated weather generator parameters for adjacent areas.
Many of the suggested predictors can be derived from a DEM, so this represents the minimum data requirement.None of the proposed statistical developments are more data intensive than the existing model.Nevertheless, there is clearly a trade-off between the amount/quality of available meteorological data for calibration and the parsimony/confidence placed in model parameters.The approach is still applicable even if predictability varies amongst parameters (as is the case, see  (RTOT) and mean temperature (ATBAR) should be regarded as a minimum performance threshold.
The weather generator demonstrated for Taiz was applied to point data, but there is no reason why it might not be calibrated using daily gridded meteorological data.In fact, given the growing availability of gridded rainfall products (e.g.Yatagai et al., 2009), this might enhance transferability of our approach.Calibrating with gridded data also raises the prospect of generating basin-scale rainfall (noting that in semi-arid environments, area-average rainfall is less meaningful than distributed rainfall for hydrological simulation).

Conclusions
Yemen faces many complex economic, political and social challenges.Some commentators have suggested that climate change is best regarded as a threat multiplier that may exacerbate existing natural resource constraints (Johnstone and Mazo, 2011).Hence, climate information is seen as a fundamental tool by development agencies for benchmarking resource scarcity, for evaluating "hot spots" of climate risk, and for appraising adaptation options.However, Yemen is hampered by a very low density of meteorological observations and a climate regime characterised by complex topographic gradients and extreme weather events.This paper described rudimentary climate mapping and local weather simulation based on modelling techniques that are forgiving of these very real data constraints.
A prototype weather generator for Yemen faithfully reproduced daily and annual diagnostics when run with parameters derived from observed temperature and precipitation series.Even when temperature parameters were interpolated from regression equations only a modest reduction in skill for persistence and cool bias (1 • C) emerged for the test site.Precipitation simulation was more problematic.In this case, parameters obtained from observations yielded realistic daily wet-day occurrence, wet-day amount distributions, maxima, annual totals and variability.However, when run with interpolated parameters, the frequency of wet days, mean wet-day amount, annual totals and variability were all underestimated at the test site.Stratification of the sites used for calibration improved the representation of growing season totals but did not produce more realistic annual totals.RMSEs for annual precipitation totals were of the order 100 mm -more than the average annual rainfall of the Rub al-Khali desert in the central northern region of Yemen.
From this pilot study we conclude that local terrain and remotely sensed variables can be used to infer annual mean temperature and precipitation across the most populous, south-west area of Yemen.Important features of the daily and seasonal weather can also be simulated at the site scale, but more rigorous validation is ultimately constrained by lack of data.International support for expanding the observing network, consolidating and recovering data will serve to strengthen future analyses.For the time being, there is scope to broaden the range of model inputs to better discriminate different types of landscape unit.
Until tested elsewhere we can only speculate about the transferability of our approach.However, the experimental design was intended to explore this important aspect within the confines of our two rainfall networks for Yemen (by calibrating and validating the model using different data sets).Satisfactory performance for the test site at Taiz suggests that the model is transferrable even when skill is assessed against a diverse set of metrics.Moreover, we deliberately build the model using information that is in the public domain, and intuitively related to local weather (e.g.elevation, latitude).In due course, our approach could be tested in other data scarce and climate vulnerable regions such as central Asia or eastern Africa.

Figure 1 Fig. 1 .
Figure 1 Station years of precipitation data held in the NWRA archive for the period 1969-2008.

Fig. 5 .
Fig. 5. Comparison of TRMM(V7) with observed mean annual rainfall at Network A sites.

Figure 8
Figure 8Model validation using Network B parameters estimated from ELEV and TRMM(V7).Note that the observed RTOT outlier (748 mm) was for an un-named high

Figure 9 Fig. 9 .
Figure 9 Observed and modelled wet-day probabilities (PWET) based on Network A

FigureFig. 10 .
Figure 10 As in Fig.9 but for Network B

Figure 11 Fig. 11 .
Figure 11 Observed and estimated mean annual precipitation (RTOT) based on Network A

Fig. 12 .
Figure 12 As in Fig.11 but for Network B

Figure 13 Fig. 13 .
Figure13Observed and modelled 10-year return period daily rainfall totals for Network A. Note comparable statistics were not derived Network B because of insufficient data.

Figure 14
Figure 14 Estimated 10-year return period daily rainfall totals based on gamma distribution (ALPHA and BETA) parameters

Fig. 14 .
Fig. 14.Estimated 10 yr return period daily rainfall totals based on gamma distribution (ALPHA and BETA) parameters.

Figure 15 Fig. 15 .
Figure 15 Cross-validation results for annual mean temper

Figure 16 Fig. 16 .
Figure 16 Observed and modelled annual mean temperature (ATBAR) based on Network C

Fig. 19 .
Fig. 19.Observed and generated cumulative precipitation totals for the growing seasons of 2001 and 2002 at Taiz.

Table 1 .
Dependent and independent variables used to estimate local precipitation.

Table 2 .
Dependent and independent variables used to estimate local temperature.

Table 3 .
Regression models for temperature and precipitation parameters derived from DEM_ELEV, NDVI, and TRMM(V7) (Network A).Parameters shown in italics are derived from others (shown in bold).Where no R 2 adj values are cited the sample mean has been applied.

Table 4 .
Temperature and precipitation diagnostics for Taiz for 1998-2005 derived from observations (OBS) and weather generator parameters calibrated against observations (WGOBS), or estimated from regression models fit to all sites (WGALL), or just those sites on the western escarpment of the Yemen Highlands (WGWH).