Preprints
https://doi.org/10.5194/hess-2021-609
https://doi.org/10.5194/hess-2021-609
 
17 Dec 2021
17 Dec 2021
Status: a revised version of this preprint is currently under review for the journal HESS.

Technical note: A procedure to clean, decompose and aggregate time series

François Ritter François Ritter
  • Finres, 59 Boulevard Exelmans, 75016 Paris, France

Abstract. Errors, gaps and outliers complicate and sometimes invalidate the analysis of time series. While most fields have developed their own strategy to clean the raw data, no generic procedure has been promoted to standardize the pre-processing. This lack of harmonization makes the inter-comparison of studies difficult, and leads to screening methods that are usually ambiguous or case-specific. This study provides a generic pre-processing procedure (called past, implemented in R) dedicated to any univariate time series. Past is based on data binning and decomposes the time series into a long-term trend and a cyclic component (quantified by a new metric, the Stacked Cycles Index) to finally aggregate the data. Outliers are flagged with an enhanced Boxplot rule called Logbox. Three different Earth Science datasets (contaminated with gaps and outliers) are successfully cleaned and aggregated with past. This illustrates the robustness of this procedure that can be valuable to any discipline.

François Ritter

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-609', Anonymous Referee #1, 29 Dec 2021
    • AC1: 'Reply on RC1', Francois Ritter, 03 Jan 2022
  • RC2: 'Submit to statistical journal', Thomas Wutzler, 08 Jan 2022
    • AC2: 'Reply on RC2', Francois Ritter, 12 Jan 2022

François Ritter

Viewed

Total article views: 795 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
598 184 13 795 45 6 7
  • HTML: 598
  • PDF: 184
  • XML: 13
  • Total: 795
  • Supplement: 45
  • BibTeX: 6
  • EndNote: 7
Views and downloads (calculated since 17 Dec 2021)
Cumulative views and downloads (calculated since 17 Dec 2021)

Viewed (geographical distribution)

Total article views: 745 (including HTML, PDF, and XML) Thereof 745 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 26 May 2022
Download
Short summary
This study offers a method to clean "time series" – data recorded at specific time intervals (hours, months...). It cuts time series into small pieces (called bins), and rejects bins without enough data. Errors in each bin are then flagged with a popular method called the "Boxplot rule" that has been improved in this study. Finally, each bin can be averaged to produce a new time series with less noise, less gaps and errors. This procedure can be generalized to any discipline, such as Economy.