Preprints
https://doi.org/10.5194/hess-2021-609
https://doi.org/10.5194/hess-2021-609

  17 Dec 2021

17 Dec 2021

Review status: this preprint is currently under review for the journal HESS.

Technical note: A procedure to clean, decompose and aggregate time series

François Ritter François Ritter
  • Finres, 59 Boulevard Exelmans, 75016 Paris, France

Abstract. Errors, gaps and outliers complicate and sometimes invalidate the analysis of time series. While most fields have developed their own strategy to clean the raw data, no generic procedure has been promoted to standardize the pre-processing. This lack of harmonization makes the inter-comparison of studies difficult, and leads to screening methods that are usually ambiguous or case-specific. This study provides a generic pre-processing procedure (called past, implemented in R) dedicated to any univariate time series. Past is based on data binning and decomposes the time series into a long-term trend and a cyclic component (quantified by a new metric, the Stacked Cycles Index) to finally aggregate the data. Outliers are flagged with an enhanced Boxplot rule called Logbox. Three different Earth Science datasets (contaminated with gaps and outliers) are successfully cleaned and aggregated with past. This illustrates the robustness of this procedure that can be valuable to any discipline.

François Ritter

Status: open (until 11 Feb 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-609', Anonymous Referee #1, 29 Dec 2021 reply
    • AC1: 'Reply on RC1', Francois Ritter, 03 Jan 2022 reply
  • RC2: 'Submit to statistical journal', Thomas Wutzler, 08 Jan 2022 reply
    • AC2: 'Reply on RC2', Francois Ritter, 12 Jan 2022 reply

François Ritter

Viewed

Total article views: 483 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
358 116 9 483 28 1 2
  • HTML: 358
  • PDF: 116
  • XML: 9
  • Total: 483
  • Supplement: 28
  • BibTeX: 1
  • EndNote: 2
Views and downloads (calculated since 17 Dec 2021)
Cumulative views and downloads (calculated since 17 Dec 2021)

Viewed (geographical distribution)

Total article views: 461 (including HTML, PDF, and XML) Thereof 461 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 20 Jan 2022
Download
Short summary
This study offers a method to clean "time series" – data recorded at specific time intervals (hours, months...). It cuts time series into small pieces (called bins), and rejects bins without enough data. Errors in each bin are then flagged with a popular method called the "Boxplot rule" that has been improved in this study. Finally, each bin can be averaged to produce a new time series with less noise, less gaps and errors. This procedure can be generalized to any discipline, such as Economy.