Preprints
https://doi.org/10.5194/hess-2021-609
https://doi.org/10.5194/hess-2021-609
 
17 Dec 2021
17 Dec 2021

Technical note: A procedure to clean, decompose and aggregate time series

François Ritter François Ritter
  • Finres, 59 Boulevard Exelmans, 75016 Paris, France

Abstract. Errors, gaps and outliers complicate and sometimes invalidate the analysis of time series. While most fields have developed their own strategy to clean the raw data, no generic procedure has been promoted to standardize the pre-processing. This lack of harmonization makes the inter-comparison of studies difficult, and leads to screening methods that are usually ambiguous or case-specific. This study provides a generic pre-processing procedure (called past, implemented in R) dedicated to any univariate time series. Past is based on data binning and decomposes the time series into a long-term trend and a cyclic component (quantified by a new metric, the Stacked Cycles Index) to finally aggregate the data. Outliers are flagged with an enhanced Boxplot rule called Logbox. Three different Earth Science datasets (contaminated with gaps and outliers) are successfully cleaned and aggregated with past. This illustrates the robustness of this procedure that can be valuable to any discipline.

Journal article(s) based on this preprint

François Ritter

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-609', Anonymous Referee #1, 29 Dec 2021
    • AC1: 'Reply on RC1', Francois Ritter, 03 Jan 2022
  • RC2: 'Submit to statistical journal', Thomas Wutzler, 08 Jan 2022
    • AC2: 'Reply on RC2', Francois Ritter, 12 Jan 2022
  • AC3: 'Comment on hess-2021-609', Francois Ritter, 21 Dec 2022

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision
ED: Reconsider after major revisions (further review by editor and referees) (03 Mar 2022) by Anke Hildebrandt
AR by Francois Ritter on behalf of the Authors (04 Mar 2022)  Author's response    Manuscript
ED: Referee Nomination & Report Request started (15 Mar 2022) by Anke Hildebrandt
RR by Jens Schumacher (18 May 2022)
ED: Reconsider after major revisions (further review by editor and referees) (02 Jun 2022) by Anke Hildebrandt
AR by Francois Ritter on behalf of the Authors (01 Aug 2022)  Author's response    Author's tracked changes    Manuscript
ED: Referee Nomination & Report Request started (03 Aug 2022) by Anke Hildebrandt
RR by Jens Schumacher (12 Dec 2022)
ED: Publish subject to minor revisions (review by editor) (13 Dec 2022) by Anke Hildebrandt
AR by Francois Ritter on behalf of the Authors (21 Dec 2022)  Author's response    Author's tracked changes    Manuscript
ED: Publish as is (03 Jan 2023) by Anke Hildebrandt

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-609', Anonymous Referee #1, 29 Dec 2021
    • AC1: 'Reply on RC1', Francois Ritter, 03 Jan 2022
  • RC2: 'Submit to statistical journal', Thomas Wutzler, 08 Jan 2022
    • AC2: 'Reply on RC2', Francois Ritter, 12 Jan 2022
  • AC3: 'Comment on hess-2021-609', Francois Ritter, 21 Dec 2022

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision
ED: Reconsider after major revisions (further review by editor and referees) (03 Mar 2022) by Anke Hildebrandt
AR by Francois Ritter on behalf of the Authors (04 Mar 2022)  Author's response    Manuscript
ED: Referee Nomination & Report Request started (15 Mar 2022) by Anke Hildebrandt
RR by Jens Schumacher (18 May 2022)
ED: Reconsider after major revisions (further review by editor and referees) (02 Jun 2022) by Anke Hildebrandt
AR by Francois Ritter on behalf of the Authors (01 Aug 2022)  Author's response    Author's tracked changes    Manuscript
ED: Referee Nomination & Report Request started (03 Aug 2022) by Anke Hildebrandt
RR by Jens Schumacher (12 Dec 2022)
ED: Publish subject to minor revisions (review by editor) (13 Dec 2022) by Anke Hildebrandt
AR by Francois Ritter on behalf of the Authors (21 Dec 2022)  Author's response    Author's tracked changes    Manuscript
ED: Publish as is (03 Jan 2023) by Anke Hildebrandt

Journal article(s) based on this preprint

François Ritter

Viewed

Total article views: 1,124 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
865 238 21 1,124 65 11 10
  • HTML: 865
  • PDF: 238
  • XML: 21
  • Total: 1,124
  • Supplement: 65
  • BibTeX: 11
  • EndNote: 10
Views and downloads (calculated since 17 Dec 2021)
Cumulative views and downloads (calculated since 17 Dec 2021)

Viewed (geographical distribution)

Total article views: 1,047 (including HTML, PDF, and XML) Thereof 1,047 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 18 Jan 2023
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
This study offers a method to clean "time series" – data recorded at specific time intervals (hours, months...). It cuts time series into small pieces (called bins), and rejects bins without enough data. Errors in each bin are then flagged with a popular method called the "Boxplot rule" that has been improved in this study. Finally, each bin can be averaged to produce a new time series with less noise, less gaps and errors. This procedure can be generalized to any discipline, such as Economy.