1 Introduction

In this report we apply the Mann-Kendall test under the long-term persistence (LTP MKt) hypothesis (Hamed 2008). We show how we can apply the function MannKendallLTP of the “HKprocess” package, to infer about the trend observed in a time series. Furthermore we apply the test to 10 time series of potential evapotranspiration (PET) at various locations in Greece.

We will use the following packages for the analysis.

library("devtools")
library("FGN")
library("HKprocess")
library("knitr")
library("zoo")

Some information about the current session follows.

session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.2 (2016-10-31)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       Europe/Istanbul             
##  date     2017-02-08
## Packages ------------------------------------------------------------------
##  package      * version date       source        
##  akima          0.6-2   2016-12-20 CRAN (R 3.3.2)
##  backports      1.0.5   2017-01-18 CRAN (R 3.3.2)
##  coda           0.19-1  2016-12-08 CRAN (R 3.3.2)
##  devtools     * 1.12.0  2016-06-24 CRAN (R 3.3.1)
##  digest         0.6.12  2017-01-27 CRAN (R 3.3.2)
##  evaluate       0.10    2016-10-11 CRAN (R 3.3.1)
##  FGN          * 2.0-12  2014-05-16 CRAN (R 3.3.1)
##  gtools         3.5.0   2015-05-29 CRAN (R 3.3.0)
##  HKprocess    * 0.0-2   2016-01-15 CRAN (R 3.3.1)
##  htmltools      0.3.5   2016-03-21 CRAN (R 3.3.1)
##  knitr        * 1.15.1  2016-11-22 CRAN (R 3.3.2)
##  lattice        0.20-34 2016-09-06 CRAN (R 3.3.1)
##  ltsa           1.4.6   2015-12-21 CRAN (R 3.3.0)
##  magrittr       1.5     2014-11-22 CRAN (R 3.3.1)
##  MASS           7.3-45  2016-04-21 CRAN (R 3.3.1)
##  Matrix         1.2-8   2017-01-20 CRAN (R 3.3.2)
##  MatrixModels   0.4-1   2015-08-22 CRAN (R 3.3.1)
##  mcmc           0.9-4   2015-07-17 CRAN (R 3.3.1)
##  MCMCpack       1.3-9   2017-01-26 CRAN (R 3.3.2)
##  memoise        1.0.0   2016-01-29 CRAN (R 3.3.1)
##  quantreg       5.29    2016-09-04 CRAN (R 3.3.1)
##  Rcpp           0.12.9  2017-01-14 CRAN (R 3.3.2)
##  rmarkdown      1.3     2016-12-21 CRAN (R 3.3.2)
##  rprojroot      1.2     2017-01-16 CRAN (R 3.3.2)
##  sp             1.2-4   2016-12-22 CRAN (R 3.3.2)
##  SparseM        1.74    2016-11-10 CRAN (R 3.3.2)
##  stringi        1.1.2   2016-10-01 CRAN (R 3.3.1)
##  stringr        1.1.0   2016-08-19 CRAN (R 3.3.1)
##  withr          1.0.2   2016-06-20 CRAN (R 3.3.1)
##  yaml           2.1.14  2016-11-12 CRAN (R 3.3.2)
##  zoo          * 1.7-14  2016-12-16 CRAN (R 3.3.2)

To reproduce the computations, change the in_dir variable after unzipping the “supplementary information.7z” file.

in_dir <- "T:"

The following directories define the locations of the data and the results of the code.

work_dir <- paste(in_dir, "/supplementary information/code and data/code",
    sep = "")

data_dir <- paste(in_dir, "/supplementary information/code and data/data",
    sep = "")
setwd(work_dir)
work_dir
## [1] "T:/supplementary information/code and data/code"
data_dir
## [1] "T:/supplementary information/code and data/data"

We load the PET data which will be used in the case study.

setwd(data_dir)
df_PET_Greece <- read.csv("PET_Greece.csv")

We set a seed, to reproduce the analysis.

set.seed(12345)

2 Simulation study

2.1 Simulation

To apply the test we simulate n_test = 5 time series exhibiting Hurst-Kolmogorov (HK) behaviour using the “FGN” package (McLeod et al. 2008). In some of them we add a trend component. The simulated time series are generated from a HK process with \(\mu\) = 0 and \(\sigma\) = 1. The Hurst parameter are set with the H_values column of the sim_values data frame. The added trend component is set with the trend_values column of the sim_values data frame. Finally we produce the sim_series list, with 5 items, each one being a simulated time series.

sim_values <- data.frame(matrix(nrow = n_test, ncol = 3))
names(sim_values) <- c("H_values", "n_values", "trend_values")

sim_values$H_values <- c(0.5, 0.8, 0.9, 0.95, 0.5)
# Values of H parameters used in the simulation

sim_values$n_values <- c(100, 100, 100, 100, 100)
# Length of the simulated time series

sim_values$trend_values <- c(0, 0, 0, 0, 0.01)
# Values of added trend component used in the simulation

sim_series <- list()

for (i in 1:n_test)
{
    sim_series[[i]] <- SimulateFGN(n = sim_values$n_values[i], H =
        sim_values$H_values[i]) + sim_values$trend_values[i] * ((1 - 50):
        (sim_values$n_values[i] - 50))
}

Now we construct the df_results data frame. Its first three columns contain the parameters used for the generation of the simulated time series.

df_results <- data.frame(matrix(nrow = n_test, ncol = 8))

names(df_results) <- c("H_values", "n_values", "trend_values", "H_estimate",
    "trend_estimate", "Mann_Kendall_2_sided_pvalue",
    "Significance_of_H_2_sided_pvalue", "Mann_Kendall_LTP_2_sided_pvalue")
df_results$H_values <- sim_values$H_values
df_results$n_values <- sim_values$n_values
df_results$trend_values <- sim_values$trend_values 

We present the df_results data frame, with its first three columns filled.

df_results
##   H_values n_values trend_values H_estimate trend_estimate
## 1     0.50      100         0.00         NA             NA
## 2     0.80      100         0.00         NA             NA
## 3     0.90      100         0.00         NA             NA
## 4     0.95      100         0.00         NA             NA
## 5     0.50      100         0.01         NA             NA
##   Mann_Kendall_2_sided_pvalue Significance_of_H_2_sided_pvalue
## 1                          NA                               NA
## 2                          NA                               NA
## 3                          NA                               NA
## 4                          NA                               NA
## 5                          NA                               NA
##   Mann_Kendall_LTP_2_sided_pvalue
## 1                              NA
## 2                              NA
## 3                              NA
## 4                              NA
## 5                              NA

2.2 Application of the test

Now we

  • Model the time series with an HK process and we estimate the parameters. We use the Maximum Likelihood Estimator as presented in Tyralis and Koutsoyiannis (2011) and we use the “HKprocess” package. We do this even for the simulated time series with the added trend component.

  • Apply the LTP MKt in all simulated time series using the “HKprocess” package.

  • Fit a linear model to all simulated time series.

mleHK_values <- list()
MannKendallLTP_values <- list()
linear_values <- list()
linear_x <- list()

for (i in 1:n_test)
{
    mleHK_values[[i]] <- mleHK(sim_series[[i]])
    # Estimation of the parameters of the HK process

    MannKendallLTP_values[[i]] <- MannKendallLTP(sim_series[[i]])
    # Application of the LTP MKt

    linear_x[[i]] <- 1:sim_values$n_values[i]

    linear_values[[i]] <- lm(sim_series[[i]] ~ linear_x[[i]])
    # Linear regression on the time series
}

We fill the df_results data frame with the values of interest.

for (i in 1:n_test)
{
    df_results$H_estimate[i] <- mleHK_values[[i]][3]
    df_results$trend_estimate[i] <- linear_values[[i]]$coefficients[2]

    df_results$Mann_Kendall_2_sided_pvalue[i] <-
        MannKendallLTP_values[[i]]$Mann_Kendall[6]

    df_results$Significance_of_H_2_sided_pvalue[i] <-
        MannKendallLTP_values[[i]]$Significance_of_H[2]

    df_results$Mann_Kendall_LTP_2_sided_pvalue[i] <-
        MannKendallLTP_values[[i]]$Mann_Kendall_LTP[2]
}

The df_results data frame now contains all information of interest for the current report. The values of interest for the LTP MKt (i.e. values in the last three columns) will be explained in the next Section.

df_results
##   H_values n_values trend_values H_estimate trend_estimate
## 1     0.50      100         0.00  0.5718440   -0.008047717
## 2     0.80      100         0.00  0.7887196    0.008649039
## 3     0.90      100         0.00  0.9001521   -0.004291134
## 4     0.95      100         0.00  0.7603781    0.001514865
## 5     0.50      100         0.01  0.6158808    0.009397300
##   Mann_Kendall_2_sided_pvalue Significance_of_H_2_sided_pvalue
## 1                 0.037917319                     3.951581e-01
## 2                 0.004453685                     1.132089e-05
## 3                 0.107144596                     6.306999e-09
## 4                 0.304208481                     3.449777e-05
## 5                 0.009993012                     3.568642e-01
##   Mann_Kendall_LTP_2_sided_pvalue
## 1                      0.10568776
## 2                      0.34427404
## 3                      0.71070572
## 4                      0.71620495
## 5                      0.04842272

2.3 The three steps of the Mann-Kendall test under the long-term persistence hypothesis

The LTP MKt consists of three steps, namely O, H and M. These three steps are presented in Hamed (2008). Each step is a hypothesis test. Let H0i denote the null hypothesis of each test and let H1i denote the alternative hypothesis, where i = O, H, M denotes the step of the LTP MKt.

Then:

  • Step O

H0O: No significant trend under the independence assumption.

H1O: Significant trend under the independence assumption.Proceed to step H.

  • Step H

H0H: No LTP. It can be concluded that there exists a trend.

H1H: LTP. Proceed to step M.

  • Step M

H0M: No significant trend.

H1M: Significant trend exists.

The three steps are sumarrized by the following sequences

  1. {H0O}, no significant trend.

  2. {H1O , H0H}, significant trend exists.

  3. {H1O , H1H , H0M}, no significant trend.

  4. {H1O , H1H , H1M}, significant trend exists.

In LTP MKt we compute the p-value for each step. The p-value for each step is presented in the last three columns of the df_results data frame. A p-value lower than a predefined level \(\alpha\) (e.g. \(\alpha\) = 0.05) gives evidence that H1 is true. If the p-value is higher than \(\alpha\), then we accept that H0 is true.

2.4 Application example

The part of interest of the df_results data frame follows. This part contains the p-values.

Mann_Kendall_2_sided_pvalue Significance_of_H_2_sided_pvalue Mann_Kendall_LTP_2_sided_pvalue
0.0379173 0.3951581 0.1056878
0.0044537 0.0000113 0.3442740
0.1071446 0.0000000 0.7107057
0.3042085 0.0000345 0.7162050
0.0099930 0.3568642 0.0484227

Let \(\alpha\) = 0.05. The application of the LTP MKt results in the following.

  1. {H1O , H0H}, significant trend exists.

  2. {H1O , H1H , H0M}, no significant trend.

  3. {H0O}, no significant trend.

  4. {H0O}, no significant trend.

  5. {H1O , H0H}, significant trend exists.

2.5 Figures

We visualize each case to better understand the findings.

3 Case study

3.1 Application of the Mann-Kendall test under the long-term persistence hypothesis

In this Section we apply the LTP MKt to 10 time series of PET at various locations in Greece.

The table with the results, when setting \(\alpha\) = 0.05 for all steps of the LTP MKt are presented in the following table.

Station appr_time_series_length H_estimate trend_estimate Mann_Kendall_2_sided_pvalue Significance_of_H_2_sided_pvalue Mann_Kendall_LTP_2_sided_pvalue Trend_existence
Heraklion 49 0.672397 -0.2377255 0.3090865 0.02283812 0.6298758 {H0O}, no trend
Ioannina 50 0.5801406 -0.4381775 0.04558661 0.2720165 0.179922 {H1O,H0H}, trend exists
Kavala 50 0.7614821 0.01110683 0.6275616 0.0002330466 0.8765082 {H0O}, no trend
Kerkyra 49 0.708458 0.3846636 0.8971222 0.004979742 0.9579901 {H0O}, no trend
Kozani 49 0.6303466 0.9253746 0.3132045 0.0368701 0.6140011 {H0O}, no trend
Larissa 49 0.7599349 -0.5992912 0.03938509 0.003407958 0.4171651 {H1O,H1H,HOM} no trend
Lemnos 48 0.7359009 -1.565468 0.0005824204 0.2593468 0.0230132 {H1O,H0H}, trend exists
Methoni 49 0.6934991 -0.4428614 0.05678219 0.03758929 0.3396473 {H0O}, no trend
Skyros 49 0.4615806 -0.005355828 0.4030851 0.8524974 0.4575056 {H0O}, no trend
Tripoli 49 0.4278933 0.08245386 0.46375 0.6086242 0.5571946 {H0O}, no trend

3.2 Figures

We visualize each case to better understand the findings.

4 References

Hamed KH (2008) Trend detection in hydrologic data: The Mann-Kendall trend test under the scaling hypothesis. Journal of Hydrology 349(3-4):350-363. doi: 10.1016/j.jhydrol.2007.11.009

McLeod AI, Yu H, Krougly ZL (2008) Algorithms for Linear Time Series Analysis: With R Package. Journal of Statistical Software 23(5)

Tyralis H, Koutsoyiannis D (2011) Simultaneous estimation of the parameters of the Hurst-Kolmogorov stochastic process. Stochastic Environmental Research and Risk Assessment 25(1):21-33. doi: 10.1007/s00477-010-0408-x