On the reproducibility and repeatability of laser absorption spectroscopy measurements for δ2H and δ18O isotopic analysis

Abstract. The aim of this study was to analyse the reproducibility of off-axis integrated cavity output spectroscopy (OA-ICOS)-derived δ2H and δ18O measurements on a set of 35 water samples by comparing the performance of four laser spectroscopes with the performance of a conventional mass spectrometer under typical laboratory conditions. All samples were analysed using three different schemes of standard/sample combinations and related data processing to assess the improvement of results compared with mass spectrometry. The repeatability of the four OA-ICOS instruments was further investigated by multiple analyses of a sample subset to evaluate the stability of δ2H and δ18O measurements. Results demonstrated an overall agreement between OA-ICOS-based and mass spectrometry-based measurements for the entire dataset. However, a certain degree of variability existed in precision and accuracy between the four instruments. There was no evident bias or systematic deviations from the mass spectrometer values, but random errors, which were apparently not related to external factors, significantly affected the final results. Our investigation revealed that analytical precision ranged ±from ±0.56‰ to ±1.80‰ for δ2H and from ±0.10‰ to ±0.27‰ for δ18O measurements, with a marked variability among the four instruments. The overall capability of laser instruments to reproduce stable results with repeated measurements of the same sample was acceptable, and there were general differences within the range of the analytical precision for each spectroscope. Hence, averaging the measurements of three identical samples led to a higher degree of accuracy and eliminated the potential for random deviations.


Introduction
In the past few decades, hydrogen and/or oxygen isotopes have been utilized in studies of different environments to address several areas of research in catchment hydrology, which include runoff generation processes (Brown et al., 1999;Weiler et al., 2003;Tetzlaff et al., 2007), preferential flow paths (Rodgers et al., 2005a,b;Lee et al., 2007;La Bolle et al., 2008), catchment and hillslope residence and transit time (McGuire and McDonnell, 2006;Lyon et al., 2008;Stewart et al., 2010), the contribution of pre-event and event water to the total stormflow (Uhlenbrook and Hoeg, 2003;Huth et al., 2004;Lyon et al., 2009), and the contribution of snowmelt in hydrograph separation applications (Taylor at al., 2002;Koeniger et al., 2008;Zhou et al., 2008).
The conventional method used to determine δ 18 O and δ 2 H (VSMOW-SLAP scale) in water samples is mass spectrometry (isotope-ratio mass spectrometry or IRMS). The disadvantages of this methodology are the time-and labourintensive measurements coupled with the high equipment and operational costs. Recently, alternative instruments for isotopic analyses have been developed to offer more costeffective opportunities for the determination of stable isotope ratios in the vapour or liquid water phase. Off-axis integrated cavity output spectroscopy (OA-ICOS) exploits Beer-Published by Copernicus Publications on behalf of the European Geosciences Union.
Lambert's law (Ricci et al., 1994) to relate the absorption of a laser light passing through a vaporized water sample to the isotopic composition of the sample. Therefore, OA-ICOS instruments allow for the simultaneous analysis of δ 2 H and δ 18 O for each injection of water, reducing time and operational expenses per measured sample. In addition, simultaneous measurements exclude the potential relative error of two separate measurements of hydrogen and oxygen isotopes at different times. Further advantages include the reduced sample size (1-1.5 ml), easier maintenance requirements without extensive sample pre-processing, shorter time to produce reportable data, and the opportunity for in situ measurements in the field (Berman et al., 2009).
Recent studies (Aggarwal et al., 2006;Lis et al., 2008;Wassenaar et al., 2008;IAEA, 2009b;Singleton et al., 2009) have investigated the accuracy and reliability of laser absorption spectroscopy measurements of δ 2 H and δ 18 O from water samples, and have underlined the main advantages of this technology. For instance, Lis et al. (2008) conducted a detailed investigation on the performance of the OA-ICOS analyzer and assessed the instrument precision, estimates of inter-sample memory and sample mass effect, and instrumental drift by comparing OA-ICOS-derived isotopic values with known standards. However, these studies only focused on the overall performance of a single machine or, in the case of measurements carried out by multiple analyzers (IAEA 2009b), the consistency of measurements among different instruments was not investigated. Moreover, shortcomings remained in the comparison of standardized schemes and analysis procedures in relation to traditional IRMS. Despite the breadth of available literature on the reliability and efficiency of laser spectroscopy, an inter-comparison test among various OA-ICOS analyzers over a significant number of water samples and under typical laboratory conditions was still absent. Therefore, the present work aimed to assess the following: (i) the reproducibility of measurements for four liquid water isotope analyzers over a 35 sample dataset; (ii) the overall performance of the four machines compared with a traditional mass spectrometer; (iii) the repeatability of each analyzer, i.e., the ability to constantly reproduce the same isotopic values; and (iv) the potential improvement in accuracy derived from the application of different analytic schemes and data-processing methods.

OA-ICOS isotope analyzer and IRMS
All isotopic analyses were conducted using the off-axis integrated cavity output spectroscopy method with four liquid water isotope analyzers (LWIA), model DLT-100, which included three units version 908-0008 and one upgraded version 908-0008-2000 manufactured by Los Gatos Research Inc. (LGR, Mountain View, California, USA). Isotopic anal-yses were performed at the Department of Land and Agro-Forest Environments at the University of Padova in Italy; the Faculty of Civil Engineering at the Czech Technical University in Prague; the Department of Environment and Agro-Biotechnologies, Centre de Recherche Public -Gabriel Lippmann in Luxembourg; and the Faculty of Civil Engineering and Geosciences at the Delft University of Technology in the Netherlands. Each of these four analyzers was connected to a LC PAL liquid auto-injector (908-0008-9001, CTC Analytics AG, Zwingen, Switzerland) for the automatic and simultaneous measurement of 2 H/H and 18 O/ 16 O ratios in water samples. The auto-injector was provided with a 1.2 µl syringe (model 26P/−mm/AS, 7701.2 N CTC) manufactured by Hamilton Company (Reno, Nevada, USA) for the injection of water samples into a heated port. All water samples and working standards were injected into ND8 32·11.6 mm screw neck 1.5 ml vials with PTFE/silicone/PTFE septums. The vials were filled with 1 ml of water and placed into 54 position trays on the auto-injector tray holder.
According to the manufacturer's specifications (Los Gatos Research Inc., 2008), the DLT-100 908-0008 LWIA provides isotopic measurements with a 1-σ precision below 0.6‰ for δ 2 H and 0.2‰ for δ 18 O. The four analyzers in this comparative test were named I, II, III, and IV. Instrument IV refers to the upgraded model. Further information regarding the OA-ICOS theory of operation is reported in Paul et al. (2001); Baer et al. (2002); Sayres et al. (2009), andWang et al. (2009).
Mass spectrometry analysis of the water samples was performed at the Isotope Geochemistry Laboratory of the Department of Geosciences, University of Trieste in Italy. Oxygen and hydrogen isotope measurements have been performed using the CO 2 /H 2 water equilibration technique (Epstein and Mayeda, 1953;Horita et al., 1989). The equilibration device used for these analyses was a GFL 1086 connected to a Thermo Fischer Delta Plus Advantage mass spectrometer (Thermo Fisher Scientific Inc., Massachusetts, USA). The precision of δ 18 O and δ 2 H measurements, achieved with the IRMS technique was ±0.05‰ and ±0.7‰, respectively. Further information regarding the IRMS theory of operation and method is available in Roether (1970); Rolston et al. (1976);Hut (1987), and Horita and Kendall (2004).

Samples
Comparative analyses were performed on a dataset of 35 water samples (Table 1)  , that were intended for initial testing purposes during instrument installation. The origin, stability and characterization of their isotopic composition were not specified. However, as these standards were used for all analyses in this study, a potential systematic deviation from the isotopic composition will not affect the comparative analyses among the four LWIAs. Issues could arise when comparing laser-derived with mass spectrometer-derived measurements but the excellent agreement between OA-ICOS-based and IRMS-based delta values (see Sect. 3.1) confirmed the declared isotopic composition of LGR reference standards. Therefore, the use of the five standards provided by the manufacturer was adopted in order to test the performance and the consistency of different analyzers in their standard setting as an average end user. For the extremely light Antarctic samples, three very negative standards were provided by the Isotope Geochemistry Laboratory of the University of Trieste, Italy. All standards were calibrated against IAEA (International Atomic Energy Agency) water standards (Gonfiantini, 1978) in relation to the VSMOW-SLAP scale and normalized adopting the procedure described in IAEA (2009a). The analysis required three reference standards that were bracketed around (i.e., slightly wider than) the isotopic composition of the unknown samples to determine their isotopic values. Therefore, samples were grouped according to their estimated isotopic ratio, and three appropriate standards were used (Table 1). Sample preparation, vial filling with disposable pipette tips, and labelling operations were executed in the laboratory of Padova to ensure consistency and homogeneity throughout the comparison.

OA-ICOS analysis schemes
To assess potential differences, all samples were evaluated with the following three analytic schemes ( Fig. 1): (i) Scheme (A) was proposed by the Isotope Hydrology Laboratory at IAEA (IAEA, 2009b). This procedure adopted two calibration standards and a control standard with an intermediate isotopic composition. Measurements and known δ values for calibration standards were interpolated by means of a linear regression to convert measured absolute 2 H/ 1 H and 18 O/ 16 O ratios to delta values. The control standard was not included in the calibration and therefore it could be used as indicator of the analysis accuracy by comparing the known value to the value measured by the laser spectroscope during the analysis. According to this scheme, each vial was sampled six times, and the first two measurements were discarded to reduce the memory effect (i.e., the influence of the previously injected sample on the isotopic content). Therefore, the reportable value was based on the average of the last four injections. Every run began with a dummy sample to prime the flow line and stabilize the system, and the last vial was filled with deionized water to clean the syringe (IAEA, 2009b). Standards were grouped in triplets, and samples formed sets of five unknowns.
(ii) Scheme (B) involved a calibration equation based on the interpolation of three standards. The scheme began with 36 injections of deionized water to clean the syringe and allow for the machine to warm up to operational temperature. Afterwards, each sample was injected six times, the first measurement was rejected, and a preliminary mean was computed among the remaining five values. To avoid the influence of potential outliers, two measurements with the highest deviation from the preliminary mean were discarded, and the remaining three injections were averaged to obtain the reportable isotopic ratio. The run ended with 12 injections of deionized water from the first vial.
(iii) Scheme (C) was a modification of scheme (B), but each vial was injected eight times instead of six. The first three measurements were discarded. The reportable delta value was then obtained by averaging the three remaining injections, while two measurements with the highest deviations from the preliminary mean were discarded.
To determine the potential influence of different methods for averaging injections on the final isotopic values, all raw data were processed using the following three approaches: (i) the mean among the last four measurements of the six injections (or eight injections in scheme (C)) was referred to as version 1; (ii) the mean among the "best" three injections out of the last five was referred to as version 2; and (iii) the mean among six measurements after discarding the first two measurements in the case of (C) was referred to as version 3. The transfer line and syringe were cleaned at the start of each run to ensure that the inter-laboratory experimental conditions were as homogeneous as possible. A new heater septum, clean and dry vials with new cap septa, a new pipette tip for each sample or standard, and new or regenerated desiccants were used for every run. All samples and standards, which were usually stored at 4 • C, were kept at laboratory temperature for a minimum of 12 h and shaken to re-equilibrate the original isotopic composition prior to any analysis. On average, the cavity operational temperature of the four analyzers for the comparative runs ranged between 26 and 29 • C.

Statistical tests
To assess the performance of the LGR laser analyzers compared with a mass spectrometer, the deviations between the OA-ICOS-derived and IRMS-derived measurements were computed for the whole dataset: where δ 2 H OA−ICOS and δ 18 O OA−ICOS were the isotopic delta values determined by the laser spectroscope for the hydrogen and oxygen isotopes, respectively, while δ 2 H IRMS and δ 18 O IRMS were the isotopic delta values determined by the mass spectrometer for the hydrogen and oxygen isotopes, respectively. Therefore, a perfect agreement between the laser spectroscope and the mass spectrometer measurements was achieved when H,O = 0, and the laser spectroscope overestimated or underestimated the mass spectrometer values for H,O > 0 and for H,O < 0, respectively. To assess the statistical significance of deviations between the OA-ICOS and the IRMS measurements, a one-sample ttest was performed to compare the mean of each deviation series to a hypothesized value equal to zero (i.e., no deviation present between spectroscopy and spectrometry measurements). A multifold approach to test the normality of each deviation series was followed. First, frequency histograms and normal probability plots (not reported herein) were utilized to visually assess the potential deviation of each distribution from the theoretical Gaussian curve. Second, Shapiro-Wilk, Kolmogorov-Smirnov, and Anderson-Darling normality tests were performed at a significance level of 0.05. The combined application of these three approaches reduced the possibility of rejecting a normal-distributed series as nonnormal or vice versa. Since this preliminary analysis demonstrated the departure of several OA-ICOS -IRMS deviations from the Gaussian distribution, a non parametric approach was followed to statistically evaluate the differences between OA-ICOS and mass spectrometry-derived isotopic measurements. Thus, the one-sample t-test was performed for the normal distributed deviation series, whereas the one-sample sign test was applied to non-normal error distributions. Under the null hypothesis that no difference existed between the observed and assumed median of zero, the one-sample sign test considered that the probability of finding observations above the assumed median should be equal to the probability of finding observations below the assumed median. Therefore, the one-sample t-test involved the formulation of the following null and alternative hypotheses: where µ was the mean of the laser-IRMS deviation series and µ 0 was the hypothesized mean (placed equal to zero).
For the one-sample sign test, the following null and alternative hypotheses were formulated: where η was the median of the OA-ICOS -IRMS deviation series and η 0 was the hypothesised median (i.e., equal to zero).

OA-ICOS -IRMS correlations and deviations
Scatterplots of δ 2 H and δ 18 O allowed for a first comparison of laser-and mass spectrometry-based measurements for each of the four analyzers in this study. The plots (reported on http://www.isotope-hydrology.net/) exhibited excellent agreement between IRMS and OA-ICOS for both isotope species and for all analyzers, with determination coefficients (R 2 ) ranging from 0.99988 to 0.99996 for hydrogen and from 0.99929 to 0.99982 for oxygen (n=49). This observation confirmed previous results (Aggarwal et al., 2006;IAEA, 2009b;Singleton et al., 2009) analyzing different natural water samples. Despite the high values of the determination coefficients, there were slight variations between the four machines and two water isotopes, which indicated potential differences in instrumental behaviour. To assess the performance of laser spectroscopes with respect to the mass spectrometer, the distributions of the OA-ICOS -IRMS deviations were compared by the box-plots depicted in Figs. 2 and 3 for hydrogen and oxygen, respectively. For each of the four laser spectroscopes, three analytic schemes and averaging methods were applied. The plots suggested two main observations. Firstly, a certain degree of variability existed among the four machines for the two isotopic ratios, both in terms of accuracy (distance of the mean from the zero line) and precision (amplitude of the boxes, i.e., standard deviation). In particular, the distributions of errors for hydrogen analyses using machine II (Fig. 2, panel II) displayed lower standard deviations compared with other machines. However, the lines representing the mean always plotted above the zero line, which revealed a constant overestimate with respect to the IRMS. In contrast, all deviation series exhibited a relatively high accuracy for hydrogen isotopic measurements on instrument IV (Fig. 2, panel IV). For oxygen, analyzers I and IV (Fig. 3 panels I and IV, respectively) exhibited measurements that were slightly underestimated with respect to the mass spectrometer. Contrary to the performance for hydrogen, analyzer II generated relatively accurate measurements of oxygen (Fig. 3, panel II). Analyzer III exhibited the maximum span of deviation and the highest degree of variability (Fig. 3, panel III), which indicated a lack of precision compared with the other spectroscopes.  Secondly, the use of different analytic schemes did not seem to be a determining factor in the improvement of isotopic measurements by laser spectroscopy. Indeed, no scheme was able to consistently provide the most accurate and precise or worst measurements for all instruments and both isotopes. Moreover, no systematic behaviour (i.e., constant overestimation or underestimation, constant low or high standard deviation) was observed among the schemes, and Hydrol. Earth Syst. Sci., 14, 1551-1566, 2010 www.hydrol-earth-syst-sci.net/14/1551/2010/ the inter-machine variability seemed to exceed the scheme variability. Furthermore, almost no difference existed between the two (or three for scheme (C)) versions of the averaging methods, which always yielded similar deviation distributions for all machines and both isotopic ratios. These results suggested that the influence of including or excluding possible outliers (i.e., injections that deviate the greatest from the preliminary mean) is minimal. Generally, the measurement was not improved by discarding such outliers, but less robust results would be generated due to the lower number of measurements considered. All of these observations were confirmed by the data in Table 2 which reports mean and standard deviation values for δ 2 H and δ 18 O OA-ICOS -IRMS error distributions from the different schemes and four analyzers.

Statistical significance of OA-ICOS -IRMS deviations and analytic schemes
The significance of the deviations in the laser spectroscopymass spectrometry was assessed by a t-test and sign test (Table 3). The difference between the OA-ICOS and IRMSderived measurements was not statistically significant at 95% if the p-value was greater than the significance level of 0.05 for both tests. In such cases, the spectroscope deviations from the mass spectrometer measurements were negligible. This condition was met for hydrogen under a few instances for three of the four spectroscopes, reflecting the variability among the different machines. Analyzers I and III only provided accurate measurements in comparison to mass spectrometry when scheme (A) was applied, whereas instrument II yielded measurements that were always significantly different from IRMS, which confirmed the overestimation (see also Fig. 2). In contrast, spectroscope IV almost always produced results with insignificant deviations compared with mass spectrometry. The performance of the four analyzers was different when the isotopic measurements for oxygen were considered. Analyzers I and III did not yield significantly different values from IRMS for all schemes (with relatively high p-values), except for (B). Analyzer II always provided accurate results that were independent of the applied scheme, which contrasted with the hydrogen measurements. Spectroscope IV exhibited reliable results only with scheme (B2). In general, there was not an absolute best scheme for accurate and precise results. However, scheme (A1), which was the original approach first described by IAEA (2009b) and accounted for the most rapidly generated data, was the scheme that most often resulted in values that were not significantly different compared with IRMS. For these reasons, scheme (A1) was considered the most representative.

Inter-machine variability and relation to sample isotopic composition
Despite the use of the same dataset, analytic scheme, averaging method, and instrumentation (only localized in different laboratories), significant inter-machine variability among the four laser analyzers and IRMS was observed. The source of this variability remains unknown: the potential effect on the results of variable water molecule density per injection and the water vapour temperature in the cavity was assessed, but no clear relationship could be identified between these variables and measurement errors (results not presented herein). Furthermore, no causal relations of OA-ICOS -IRMS deviations with external factors could be determined to explain these occasional deviations, which agreed with the empirical observation that "each analyser has its own idiosyncrasies" (Newman, B. D., personal communication, 2009). Nevertheless, a certain degree of error was related to the extreme isotopic content of the analyzed samples. Figure 4 shows the deviations between OA-ICOS-and IRMSderived measurements for the whole dataset (49 samples with seven repetitions included) for δ 2 H (panel a) and δ 18 O (panel b). Results plotted above or below the zero line (indicating the perfect agreement between the two measurement approaches) and exhibited no regular structure for any instrument or isotopic content. The only exception, especially for hydrogen, was the clear underestimated measurement performed by all laser spectroscopes for very light samples (more negative than −300‰ δ 2 H). This behaviour could be attributed to the memory effect which can have an important influence when analyzing extreme isotopic values (IAEA, 2009b). In such cases, discarding the first two injections and averaging the remaining four measurements could not be sufficient to overcome the problem. This effect was observed in approximately 50% of the light sample measurements that were affected by an unknown source of error. Moreover, this behaviour was clearly marked for δ 2 H readings, which revealed that the difference in accuracy could potentially affect these devices. Therefore, further investigations on this issue with special testing procedures are advised. Since no enriched samples over −10‰ δ 2 H and −0.71‰ δ 18 O were included in the dataset, predictions about potential similar behaviour in these data ranges were not possible. However, analyses executed by IAEA (2009b) on a set of artificial water samples up to approximately +1670‰ δ 2 H and +14‰ δ 18 O demonstrated comparable results to mass spectrometry, revealing a satisfactory throughput of laser analyzers at least on the positive side of the scale. In general, the analysis of samples with extremely positive or negative isotopic compositions by means of LGR laser spectroscopes should be performed carefully due to potential over-or underestimation errors.

Performance examples
A few examples of the overall performance of the laser spectroscopes for both δ 2 H and δ 18 O measurements are reported in Fig. 5. The four panels (a-d) present comparisons between OA-ICOS-and IRMS-derived measurements for four samples featuring a different range of isotopic composition, and allow for the assessment of both accuracy (vicinity to the origin, where the mass spectrometry-derived value was placed) and precision (width of error bars, which reproduced the standard deviation of each measure) of OA-ICOS measurements for the two isotopic ratios and the four analyzers. Therefore, the following conclusions could be drawn from Fig. 5. First, the inter-machine variability was apparent in the different degrees of deviation for the mass spectrometer values across the four samples. However, no instrument exhibited the overall best or worst performance in terms of accuracy and precision. Second, biases or systematic deviations were not evident for any particular instrument with respect to the isotopic content of samples, except for the marked underestimation of all analyzers for measurements of very light samples (Fig. 5d). The values determined by analyzers I, III, and IV grouped closely, while machine II exhibited isotopic measurements that were less underestimated because of a compensation effect from the usual positive deviation of the actual δ 2 H value. Despite the lack of accuracy for hydrogen measurements of light samples, standard deviation values in the −300‰ to −400‰ δ 2 H range were comparable or lower than those obtained by measurements of other samples, which suggested that instrumental precision was independent from the sample isotopic content. In general, no apparent relationship was identified between accuracy and precision of the two water isotopes. Thus, a good spectroscope performance in measuring hydrogen isotopic content did not guarantee a similar performance for oxygen or vice versa.

Precision
Previous investigations have revealed different estimates for spectroscope precision. Aggarwal et al. (2006) reported a degree of precision of ±1‰ for hydrogen and ±0.3‰ for oxygen, using a prototype version of the LWIA featuring a different configuration than the commercial version.    Lyon et al. (2009) obtained precision values of ±0.37‰ for hydrogen and ±0.12‰ for oxygen, for the DLT-100, 908-0008 after measuring a reference standard with known isotopic content for more than six months. In our comparative analysis, we observed a marked difference in precision among the four spectroscopes. Table 4 presents the basic statistics of standard deviation values obtained in δ 2 H and δ 18 O measurements for the dataset of 49 samples. The variable behaviour of the four spectroscopes was evident when the statistical properties of the standard deviations were considered. For hydrogen, machine I performed the best in terms of preci-sion with 75% of the measurements yielding a standard deviation less than ±0.72‰, which satisfies even complex hydrological applications of δ 2 H. In contrast, machines IV and III exhibited standard deviations in the hydrogen measurement that were noticeably different from the values reported in the literature, with means greater than ±1‰. The 25th and 75th percentiles suggest a lack of precision that, independently from machine accuracy, can affect the ability to analyze physical processes in the field, especially when differences in the water isotopic content are below 2‰ δ 2 H. A performance contrary to hydrogen was observed for oxygen (Table 4), with spectroscope IV offering good precision. Except for instrument III, the analysers were characterised by a comparable or better precision than reported by the manufacturer (Los Gatos Research, Inc., 2008 or by the aforementioned studies. Spectroscope III provided the highest standard deviation, and lacked precision (but not accuracy) for both water isotopes.

Precision and accuracy improvement
Our analyses did not suggest any evidence of factors that explained the variable behaviour of the four laser analyzers. Therefore, such differences were accounted for by white noise, which can be difficult to erase or reduce. During the post-processing phase, no "data cleaning" was performed and all raw data provided by the four instruments were reported to offer an undisturbed comparative view of the spectroscope performance. However, the results of each run must be observed carefully to detect possible "bad injections", i.e., spikes or large dips in the amount of water sampled by the syringe. A number of water molecules in the range of 2-4×10 16 per cm 3 should be maintained during the run. If the number of molecules introduced into the laser cell cannot be stabilized in this expected range, the absorption peaks can be significantly influenced and higher uncertainties in the isotope ratios are likely to occur (Aggarwal et al., 2006;IAEA, 2009b). A dramatic case of this behaviour can be exemplified by Fig. 6, which reports the variations in water molecules injected into the cavity during a run performed by analyzer IV (scheme (A1)). The number of molecules per cm 3 was within the expected range and no noticeable trend or drift could be observed. Nevertheless, few injections were outside of the average pattern. The most prominent was a marked dip that occurred during injection number 257, which corresponded to the fifth of six for the determination of standard LGR3. Injection 257 yielded a water volume (3.13×10 16 molecules/cm 3 ) that was significantly less than the mean (3.48×10 16 molecules/cm 3 ) for the entire 270 injections of the run. This inconsistent water amount matched reportable delta values of hydrogen and oxygen, which were significantly different from the three values used for the final determination of the sample isotopic composition by the average of the last four injections (Table 5). The known isotopic content of standard LGR3 was −79.00‰ and −11.54‰ for δ 2 H and δ 18 O, respectively. The average of the last four injections, which included number 257, provided a reportable delta values of −0.55‰ and −11.16‰ for δ 2 H and δ 18 O, respectively. The deletion of injection 257 greatly reduced the standard deviation (from 5.43‰ to 1.69‰ for hydrogen and from 0.70‰ to 0.37‰ for oxygen) and improved the accuracy for δ 2 H to provide a final value of −77.93‰, which was closer to the known reference (injection number 257 included). Unfortunately, this operation did not improve the accuracy for δ 18 O measurement to yield a final value of −10.85‰, which was further from the actual value. This different behaviour can most likely be attributed to the general deviations of oxygen measurements from IRMS, which characterized spectroscope IV. These results clearly demonstrated the potential influence of inconsistent injections. No evident factor was responsible for such peculiar behaviour, but the intrinsic variability of the instrument. Moreover, not all injections that deviated from the average volume of water corresponded to inconsistent isotopic values. Nevertheless, a close inspection of the raw data is always recommended (IAEA, 2009b) because deleting values that correspond to inconsistent injections would improve both the precision and accuracy of LGR laser spectroscopes.

Repeatability
Seven samples were selected to assess the repeatability of δ 2 H and δ 18 O measurements provided by the OA-ICOS instruments. These samples were analyzed three times (in three different vials) during the same run. Results for four representative samples with different isotopic composition are displayed in Figs. 7 and 8, with three repetitions for each instrument presented (in gray) along with the mean (in black). Error bars refer to the standard deviation computed for each measurement. IRMS δ 2 H or δ 18 O values and standard deviations are represented as horizontal solid and dashed lines, respectively. A visual inspection of the four instruments revealed inconsistent behaviour. The repeated measurements were very similar and within the instrumental precision in some cases (e.g., analyzer I in Fig. 7d and analyzer II in Fig. 8c) and appeared unsteady in other instances (e.g., analyzer I in Fig. 7b and analyzer IV in Fig. 8d). Particularly, repeated δ 2 H measurements of sample 14 (Fig. 7, panel a) by spectroscopes I, II, and III fell within the analytical uncertainty of the IRMS, which resulted in differences between the lowest and the highest measurement equivalent to 0.71%, 0.57%, and 0.26‰, respectively. These values were comparable or lower than the instrumental precision and revealed a satisfying repeatability of the instruments. In contrast, analyzer IV produced more unstable results with a greater difference between the lowest and highest δ 2 H measurements (1.63‰).  for δ 2 H measurements, and analyzer IV exhibited the highest fluctuations in the repeated measurements. Analyzer IV behaved almost analogously to analyzer I for oxygen quantification, whereas analyzer III presented the most marked variations.
Overall, the capability to reproduce comparable results from the analysis of repeated samples was acceptable, with differences between the maximum and the minimum values which were generally within the range of the standard deviation yielded by the single measurements. This result agreed with previous studies of the LGR analyzers (IAEA, 2009b). Nevertheless, in some instances, the repeated measurements of the same sample were relatively different with marked unsteadiness and randomly distributed inconsistencies.   Repeatability plots displayed in Figs. 7 and 8 also present the mean computed for three samples analysed over time (darker symbol). Averaging three repeated measurements may overcome the random deviations from the real isotopic ratios that are occasionally generated. The same statistical procedure was followed as in the analysis of δ 2 H and δ 18 O deviations (Sect. 2.4) to determine if this approach might lead to a significant improvement of results. The dataset investigated during this study was formed by the first, second, or third repetitions and by the mean value among the three repetitions for the seven samples. The distribution of each series was assessed by the Shapiro-Wilk, Kolmogorov-Smirnov, and Anderson-Darling normality tests at a significance level of 0.05. According to the type of distribution, a one-sample t-test (for normal distributions) or non parametric one-sample sign test (for the few non-normal distri-butions) was performed to assess whether the deviations between the OA-ICOS and the IRMS measurements for the seven repeated samples were statistically significant. The ttest and sign test results are presented in Table 7 for hydrogen and oxygen measurements. In many cases, the laser spectroscopes yielded accurate measurements for a single vial. However, averaging the values obtained by three identical samples almost always yielded reportable delta values that were not statistically different from IRMS (α:0.05). For hydrogen, the deviation from the mass spectrometer values was significant in three of 12 cases, whereas the mean among three samples always produced more consistent results that were not significantly different from the reference value. For oxygen, the results deviated from the IRMS output in three of 12 cases, while the average almost always yielded (three times out of four) accurate results compared with mass spectrometry. According to these results, a higher degree of accuracy can be obtained with LGR analysers by averaging the measurements of three samples. In these cases, the increased analysis speed of spectroscope IV (new version) allowed for more consistent results to be achieved by averaging values in a considerably shorter time frame than spectroscopes I, II, and III.

Conclusions
Because of their many advantages and great research potential, the use of off-axis integrated cavity output spectroscopy instruments is rapidly emerging among numerous institutions that deal with hydrological and natural resources research. Despite the number of previous investigations on the performance of such analyzers, no study to date has inspected the consistency of results obtained using different units of the same model. The present study focused on an inter-comparison of four liquid water isotope analyzers manufactured by Los Gatos Research Inc. (Mountain View, California, USA), which were versions 908-0008 and 908-0008-2000 of model DLT-100. This investigation aimed at assessing the performance of the four spectroscopes in terms of measurement reproducibility and repeatability in comparison with the performance of a traditional isotope-ratio mass spectrometer, on a wide range of isotopic ratios of natural water samples (ranging from −425‰ to −11‰ for δ 2 H and from −55‰ to −1‰ for δ 18 O). The laser units were operated running three different analytic schemes for the isotopic determination of water samples.
Scatterplots of laser-based versus IRMS-based measurements over the whole dataset demonstrated an excellent agreement between the two methods for both water isotopes and for all analyzers, which confirmed the overall good performance of OA-ICOS instruments, as had been indicated by previous studies. However, statistical analysis of deviations from the mass spectrometer measurements revealed a certain degree of variability in accuracy and precision among the four instruments and the two isotopic ratios. No bias or systematic deviations were evident for any particular machine and none was indicated as the overall best or worst performer. Nevertheless, one spectroscope exhibited a marked positive deviation from zero for hydrogen measurements, whereas another analyzer consistently underestimated oxygen measurements. A third instrument lacked precision, especially for oxygen, compared with the other instruments. Interestingly, there was no causal relation between OA-ICOS -IRMS deviations and external factors, and the intrinsic variability of the analyzers was the only determined cause for such differences. Errors appeared to be randomly distributed within the same instrument and among the four machines. Therefore, the source of this variability remains unknown. The only evidence regarding a certain degree of error was related to the extremely light isotopic content of samples when δ 2 H values were more negative than −300‰, which resulted in a clear underestimation of all instruments. Such a behaviour was much less marked for oxygen measurements and could be partly related to the substantial influence of memory effects when analyzing very light samples.
The use of different analytic schemes did not seem to significantly improve the isotopic measurements by laser spectroscopy. For all machines and both water isotopes, no scheme was able to consistently provide the most accurate and precise or worst measurements. Accuracy and precision seemed to be more related to the spectroscope than to the scheme. However, the analysis scheme first described by the International Atomic Energy Agency ranked among the best compared with IRMS.
Results also showed that the 1-σ precision ranged between ±0.56‰ and 1.80‰ for δ 2 H and between ±0.10‰ and 0.27‰ for δ 18 O measurements for the various instruments. Overall, these values were comparable or better than those reported by the manufacturer and in previous studies. One of the four analyzers yielded slightly more precise results for both isotopic ratios, but another instrument lacked precision unrelated to any evident factor.
Analyses conducted on a subset of samples revealed an acceptable capability of laser instruments to reproduce comparable results on repeated samples. The differences between maximum and minimum measurements generally fell within the range of the standard deviation of a single measurement. Averaging the delta values of three identical samples almost always led to a higher degree of accuracy and avoided potential random deviations. This approach was very timeconsuming, and therefore might be applicable for the analysis of only a few samples and/or when more robust results are necessary.
In conclusion, OA-ICOS laser analyzers appeared to be cost-effective and not particularly difficult to operate compared with conventional mass spectrometry. Despite a certain degree of inter-machine variability and some randomly distributed errors, these instruments are a powerful approach for hydrological and environmental applications to determine hydrogen and oxygen isotopic compositions in water samples.