Analysis and estimation of the effects of missing values on the calculation of monthly temperature indices

Massetti, L. 2014. Analysis and estimation of the effects of missing values on the calculation of monthly temperature indices. Theoretical and applied climatology, 117(3-4), 511-519.

Long and complete climatic data series are a fundamental resource for scientific research on climate change.

Data quality is important, and missing value or data gap management is a key process that must be dealt with carefully to produce reliable datasets.

Although a large variety of techniques are available for gap-filling, a widespread strategy is to consider a dataset reliable if the rate of missing data is below a given threshold.

However this strategy varies from study to study.

The aim of this paper is to analyze the impact of missing daily values on the estimation of monthly average temperature indices.

The relationship between the error of the estimate and the presence of random or consecutive missing values, as well as data series autocorrelation is also analyzed.

A theoretical, a linear and a nonlinear model to estimate the maximum error at the 95% confidence interval are tested on data series provided by national and worldwide networks of stations.

Consecutive missing values have an important effect on error estimation due to autocorrelation of temperature data series.

On our dataset, the mean and standard deviation of the error for five consecutive missing values (0.27 ± 0.05 °C) on a normalized daily series (σ=1) was higher than for five random missing values (0.14 ± 0.006 °C).

A nonlinear model taking into account the number of consecutive missing values is able to estimate the error and its performance is less affected by the presence of consecutive missing values than the other proposed models.