Civil Engineer, MSc., PhD candidate
Maintenance, upgrading and extension of the Decision Support System for the management of the Athens water resource system
Duration: October 2008–November 2011
Budget: €72 000
Project director: N. Mamassis
Principal investigator: D. Koutsoyiannis
This research project includes the maintenance, upgrading and extension of the Decision Support System that developed by NTUA for EYDAP in the framework of the research project “Updating of the supervision and management of the water resources’ system for the water supply of the Athens’ metropolitan area”. The project is consisted of the following parts: (a) Upgrading of the Data Base, (b)Upgrading and extension of hydrometeorological network, (c) upgrading of the hydrometeorological data process software, (d) upgrading and extension of the Hydronomeas software, (e) hydrological data analysis and (f) support to the preparation of the annual master plans
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Predictability of monthly temperature and precipitation using automatic time series forecasting methods, Acta Geophysica, doi:10.1007/s11600-018-0120-7, 2018.
We investigate the predictability of monthly temperature and precipitation by applying automatic univariate time series forecasting methods to a sample of 985 40-year long monthly temperature and 1552 40-year long monthly precipitation time series. The methods include a naïve one based on the monthly values of the last year, as well as the random walk (with drift), AutoRegressive Fractionally Integrated Moving Average (ARFIMA), exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (BATS), simple exponential smoothing, Theta and Prophet methods. Prophet is a recently introduced model inspired by the nature of time series forecasted at Facebook and has not been applied to hydrometeorological time series before, while the use of random walk, BATS, simple exponential smoothing and Theta is rare in hydrology. The methods are tested in performing multi-step ahead forecasts for the last 48 months of the data. We further investigate how different choices of handling the seasonality and non-normality affect the performance of the models. The results indicate that (a) all the examined methods apart from the naïve and random walk ones are accurate enough to be used in long-term applications, (b) monthly temperature and precipitation can be forecasted to a level of accuracy which can barely be improved using other methods, (c) the externally applied classical seasonal decomposition results mostly in better forecasts compared to the automatic seasonal decomposition used by the BATS and Prophet methods and (d) Prophet is competitive, especially when it is combined with externally applied classical seasonal decomposition
Additional material:
Works that cite this document: View on Google Scholar or ResearchGate
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, One-step ahead forecasting of geophysical processes within a purely statistical framework, Geoscience Letters, 5, 12, doi:10.1186/s40562-018-0111-1, 2018.
The simplest way to forecast geophysical processes, an engineering problem with a widely recognized challenging character, is the so-called “univariate time series forecasting” that can be implemented using stochastic or machine learning regression models within a purely statistical framework. Regression models are in general fast-implemented, in contrast to the computationally intensive Global Circulation Models, which constitute the most frequently used alternative for precipitation and temperature forecasting. For their simplicity and easy applicability, the former have been proposed as benchmarks for the latter by forecasting scientists. Herein, we assess the one-step ahead forecasting performance of 20 univariate time series forecasting methods, when applied to a large number of geophysical and simulated time series of 91 values. We use two real-world annual datasets, a dataset composed by 112 time series of precipitation and another composed by 185 time series of temperature, as well as their respective standardized datasets, to conduct several real-world experiments. We further conduct large-scale experiments using 12 simulated datasets. These datasets contain 24,000 time series in total, which are simulated using stochastic models from the families of AutoRegressive Moving Average and AutoRegressive Fractionally Integrated Moving Average. We use the frst 50, 60, 70, 80 and 90 data points for model-ftting and model-validation, and make predictions corresponding to the 51st, 61st, 71st, 81st and 91st respectively. The total number of forecasts produced herein is 2,177,520, among which 47,520 are obtained using the real-world datasets. The assessment is based on eight error metrics and accuracy statistics. The simulation experiments reveal the most and least accurate methods for long-term forecasting applications, also suggesting that the simple methods may be competitive in specifc cases. Regarding the results of the realworld experiments using the original (standardized) time series, the minimum and maximum medians of the absolute errors are found to be 68 mm (0.55) and 189 mm (1.42) respectively for precipitation, and 0.23 °C (0.33) and 1.10 °C (1.46) respectively for temperature. Since there is an absence of relevant information in the literature, the numerical results obtained using the standardized real-world datasets could be used as rough benchmarks for the one-step ahead predictability of annual precipitation and temperature
Full text: http://www.itia.ntua.gr/en/getfile/1834/1/documents/s40562-018-0111-1.pdf (3083 KB)
Works that cite this document: View on Google Scholar or ResearchGate
H. Tyralis, and G. Papacharalampous, Variable selection in time series forecasting using random forests, Algorithms, 10, 114, doi:10.3390/a10040114, 2017.
Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to suggest an optimal set of predictor variables. Furthermore, we compare its performance to benchmarking methods. The first dataset is composed by 16,000 simulated time series from a variety of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. The second dataset consists of 135 mean annual temperature time series. The highest predictive performance of RF is observed when using a low number of recent lagged predictor variables. This outcome could be useful in relevant future applications, with the prospect to achieve higher predictive accuracy.
Full text: http://www.itia.ntua.gr/en/getfile/1827/1/documents/Variable_Selection_in_Time_Series_Forecasting_Using_Random_Forests.pdf (5509 KB)
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Forecasting of geophysical processes using stochastic and machine learning algorithms, European Water, 59, 161–168, 2017.
We perform an extensive comparison between four stochastic and two machine learning (ML) forecasting algorithms by conducting a multiple-case study. The latter is composed by 50 single-case studies, which use time series of total monthly precipitation and mean monthly temperature observed in Greece. We apply a fixed methodology to each individual case and, subsequently, we perform a cross-case synthesis to facilitate the detection of systematic patterns. The stochastic algorithms include the Autoregressive order one model, an algorithm from the family of Autoregressive Fractionally Integrated Moving Average models, an Exponential Smoothing State Space algorithm and the Theta algorithm, while the ML algorithms are Neural Networks and Support Vector Machines. We also use the last observation as a Naïve benchmark in the comparisons. We apply the forecasting methods to the deseasonalized time series. We compare the one-step ahead as also the multi-step ahead forecasting properties of the algorithms. Regarding the one-step ahead forecasting properties, the assessment is based on the absolute error of the forecast of the last observation. For the comparison of the multi-step ahead forecasting properties we use five metrics applied to the test set (last twelve observations), i.e. the root mean square error, the Nash-Sutcliffe efficiency, the ratio of standard deviations, the index of agreement and the coefficient of correlation. Concerning the ML algorithms, we also perform a sensitivity analysis for time lag selection. Additionally, we compare more sophisticated ML methods as regards to the hyperparameter optimization to simple ones.
Full text: http://www.itia.ntua.gr/en/getfile/1768/1/documents/EW_2017_59_22.pdf (1163 KB)
See also: http://www.ewra.net/ew/issue_59.htm
Works that cite this document: View on Google Scholar or ResearchGate
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Error evolution patterns in multi-step ahead streamflow forecasting, 13th International Conference on Hydroinformatics (HIC 2018), Palermo, Italy, 2018.
Multi-step ahead streamflow forecasting is of practical interest. We examine the error evolution in multi-step ahead forecasting by conducting six simulation experiments. Within each of these experiments we compare the error evolution patterns created by 16 forecasting methods, when the latter are applied to 2 000 time series. Our findings suggest that the error evolution can differ significantly from the one forecasting method to the other and that some forecasting methods are more useful than others. However, the errors computed at each time step of a forecast horizon for a specific single-case study strongly depend on the case examined and can be either small or large, regardless of the used forecasting method and the time step of interest. This fact is illustrated with a comparative case study using 92 monthly time series of streamflow.
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Forecasting of geophysical processes using stochastic and machine learning algorithms, 10th World Congress on Water Resources and Environment "Panta Rhei", Athens, EWRA2017_A_110904, doi:10.13140/RG.2.2.30581.27361, European Water Resources Association, Athens, 2017.
We perform an extensive comparison between four stochastic and two machine learning (ML) forecasting algorithms by conducting a multiple-case study. The latter is composed by 50 single-case studies, which use time series of total monthly precipitation and mean monthly temperature observed in Greece. We apply a fixed methodology to each individual case and, subsequently, we perform a cross-case synthesis to facilitate the detection of systematic patterns. The stochastic algorithms include the Autoregressive order one model, an algorithm from the family of Autoregressive Fractionally Integrated Moving Average models, an Exponential Smoothing State Space algorithm and the Theta algorithm, while the ML algorithms are Neural Networks and Support Vector Machines. We also use the last observation as a Naive benchmark in the comparisons. We apply the forecasting methods to the deseasonalized time series. We compare the one-step ahead as also the multi-step ahead forecasting properties of the algorithms. Regarding the one-step ahead forecasting properties, the assessment is based on the absolute error of the forecast of the last observation. For the comparison of the multi-step ahead forecasting properties we use five metrics applied to the test set (last twelve observations), i.e. the root mean square error, the Nash-Sutcliffe efficiency, the ratio of standard deviations, the index of agreement and the coefficient of correlation. Concerning the ML algorithms, we also perform a sensitivity analysis for time lag selection. Additionally, we compare more sophisticated ML methods as regards to the hyperparameter optimization to simple ones.
Full text: http://www.itia.ntua.gr/en/getfile/1717/1/documents/EWRA2017_paper.pdf (8540 KB)
Additional material:
H. Tyralis, and G. Papacharalampous, Univariate time series forecasting properties of random forests, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-1901, European Geosciences Union, 2018.
The random forests’ univariate time series forecasting properties have remained unexplored. Here we assess the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to suggest an optimal set of predictor variables. Furthermore, we compare their performance to benchmarking methods. The first dataset consists of 16 000 simulated time series from a variety of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. The second dataset consists of 135 mean annual temperature time series. The random forests performed better mostly when using a few recent lagged predictor variables. A possible explanation of this result is that increasing the number of lagged variables decreases the length of the training set and simultaneously decreases the information exploited from the original time series during the model fitting phase. Furthermore, the random forests were comparable to the benchmarking methods.
Full text: http://www.itia.ntua.gr/en/getfile/1826/1/documents/EGU2018-1901_abstract.pdf (31 KB)
G. Papacharalampous, D. Koutsoyiannis, and A. Montanari, Toy models for increasing the understanding on stochastic process-based modelling, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-1900-1, European Geosciences Union, 2018.
Montanari and Koutsoyiannis (2012) have introduced a novel blueprint for hydrological modelling with the aim to integrate deterministic process-based modelling and uncertainty quantification within a stochastic framework (hereafter “bMK”). The outcome of this integration is referred to as “stochastic process-based modelling”, while the term “stochastic” conjointly represents probability, statistics and stochastic processes. The bMK is provided by an analytically derived theoretical scheme for the quantification of the global uncertainty in the output of deterministic models. The analytical formulation of this theoretical scheme can be replaced in practice by a Monte Carlo simulation algorithm, which simulates the stochastic model comprising the deterministic one and is a part of a larger algorithmic approach. The adopted methodological tools and assumptions within a specific approach can largely affect the quality of the provided solution. Therefore, any possible algorithmic procedure for the implementation of the bMK should be thoroughly examined. Herein, we adopt the toy model research method to conduct several controlled experiments of large scale. These experiments focus on specific research questions, all of them aiming to increase the understanding on the theoretical scheme under discussion. This understanding is fundamental for dealing with the additional theoretical, algorithmic and computational requirements implied by the choice to perform stochastic process-based modelling, instead of deterministic process-based modelling.
Full text: http://www.itia.ntua.gr/en/getfile/1813/1/documents/EGU2018-1900-1.pdf (33 KB)
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, A step further from model-fitting for the assessment of the predictability of monthly temperature and precipitation, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-864, European Geosciences Union, 2018.
“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk”,∼John von Neumann. This famous quote, literally possible as proved by Mayer et al. (2010), has been widely used to question the parsimony of a model providing a good description of the available data. Still, a significant part of the hydrological literature insists in adding parameters, trend or of other type, to models to increase their descriptive power within the concept of geophysical time series analysis and without testing their predictive ability. Herein, we move a step further from model-fitting and actually run in forecast mode several automatic univariate time series models with the aim to assess the predictability of monthly temperature and precipitation. We examine a sample of 985 monthly temperature and 1552 monthly precipitation time series, observed at stations covering a significant part of the Earth’s surface and, therefore, including various real-world process behaviours. All the time series are 40-years long with no missing values. We compare the naïve based on the monthly values of the last year, ARFIMA, exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (BATS), simple exponential smoothing (SES), Theta and Prophet forecasting methods. Prophet is a recently introduced model inspired by the nature of time series forecasted at Facebook and has not been applied to hydrometeorological time series in the past, while the use of BATS, SES and Theta is rare in hydrology. The methods are tested in performing multi-step ahead forecasts for the last 48 months of the data. The results are summarized in global scores, while their examination by group of stations leads to 5 individual scores for temperature and 6 for precipitation. The groups are formed according to the geographical vicinity of the stations. The findings suggest that all the examined models are accurate enough to be used in long-term forecasting applications. For the total of the temperature time series the use of an ARFIMA, BATS, SES, Theta or Prophet model, instead of the naïve method, leads in about 19-29% more accurate forecasts in terms of root mean square error, or even in about 30-32% more accurate forecasts specifically for the temperature time series observed in North Europe. For the total of the precipitation time series the use of all these automatic methods leads in about 21-22% better forecasts than the use of the naïve method, while for the geographical regions of North America, North Europe and East Asia these percentages are 26-29%, 22-24% and 32-38% respectively. We think that the level of the forecasting accuracy can barely be improved using other methods, as indicated by the experiments of Papacharalampous et al. (2017).
Full text: http://www.itia.ntua.gr/en/getfile/1808/1/documents/EGU2018-864.pdf (38 KB)
G. Papacharalampous, and H. Tyralis, Large-scale assessment of random forests for data-driven hydrological modelling at monthly scale, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-1902, European Geosciences Union, 2018.
We assess the performance of random forests in modelling mean monthly streamflow based on mean monthly precipitation and potential evapotranspiration for 293 catchments. The assessment is made by computing the values of 18 metrics for the calibration and test periods, as well as by comparing these values with their respective computed for a lumped conceptual hydrological model with two parameters. The results are presented in maps and in an aggregated form. While the performance of the conceptual model is mostly similar for the two examined periods, the performance of random forests is far better for the calibration period than it is for the test period. Still, random forests perform better than the conceptual model for both periods.
Full text: http://www.itia.ntua.gr/en/getfile/1807/1/documents/EGU2018-1902.pdf (30 KB)
G. Papacharalampous, and H. Tyralis, One-step ahead forecasting of annual precipitation and temperature using univariate time series methods (solicited), European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-2298-1, European Geosciences Union, 2018.
We investigate the one-step ahead predictability of annual geophysical processes using 16 univariate time series forecasting methods. We examine two real-world datasets, a precipitation dataset and a temperature dataset, together containing 297 annual time series of 91 values. We use the first 50, 60, 70, 80 and 90 data points for model-fitting and model-validation and make predictions corresponding to the 51st, 61st, 71st, 81st and 91st respectively. The assessment of the methods’ performance is based on four error metrics and three accuracy statistics. The former are the error, absolute error, percentage error and absolute percentage error, while the latter are the median of the absolute errors, median of the absolute percentage errors and linear regression coefficient computed per category of tests.
Full text: http://www.itia.ntua.gr/en/getfile/1806/1/documents/EGU2018-2298-1.pdf (31 KB)
G. Papacharalampous, and H. Tyralis, Illustrating important facts about multi-step ahead forecasting of univariate hydrological time series, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-3570, European Geosciences Union, 2018.
We present a case study using a long time series of monthly streamflow to illustrate important points regarding multi-step ahead forecasting of univariate hydrological processes. We forecast the monthly values of five discrete one-year periods based on the available past monthly values. To produce a faithful image of the underlying phenomena we implement a sufficient number of popular forecasting algorithms and compute an adequate number of metrics on the test sets. The algorithms are applied to the deseasonalized time series, while seasonality is subsequently recovered in the produced forecasts. The ranking of the methods clearly depends on the forecasting attempt and the computed metric, while the forecasting quality can be good or bad.
Full text: http://www.itia.ntua.gr/en/getfile/1804/1/documents/EGU2018-3570.pdf (31 KB)
H. Tyralis, and G. Papacharalampous, Multi-step ahead forecasting of monthly streamflow discharge time series using a variety of algorithms, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-3571-1, European Geosciences Union, 2018.
We compare a variety of algorithms in performing multi-step ahead forecasts of monthly streamflow discharge. We examine 285 time series originating from MOPEX catchments. We seasonally decompose the time series using a multiplicative model. We apply the algorithms to the deseasonalized time series and further make twelve-step ahead predictions corresponding to the last year’s monthly values of each time series. These values are not used in the fitting and validation processes. The forecasts are multiplied by the estimated seasonal component and, subsequently, they are compared with each other using an adequate number of metrics and two benchmarks. The results indicate that most of the methods perform well, in average better than the benchmark ones.
Full text: http://www.itia.ntua.gr/en/getfile/1803/1/documents/EGU2018-3571-1.pdf (32 KB)
G. Papacharalampous, H. Tyralis, and N. Mamassis, Conceptual hydrological modelling at daily scale: Aggregating results for 340 MOPEX catchments, European Geosciences Union General Assembly 2018, Geophysical Research Abstracts, Vol. 20, Vienna, EGU2018-3759, European Geosciences Union, 2018.
We present a large-scale model-implementing study aiming at the comparison of 3 daily conceptual hydrological models. These models comprise a different number of parameters, i.e. 4, 5 and 6 parameters. The comparison is performed for 340 MOPEX catchments, while each of the modelling approaches is assessed by computing the values of 18 metrics for the calibration and validation periods. The results are presented in maps and in an aggregated form, indicating that the models exhibit a quite similar performance, with the 6-parameter model being slightly better than the rest in terms of specific metrics. The metric values are mostly to a small extent better for the calibration set than they are for the validation set.
Full text: http://www.itia.ntua.gr/en/getfile/1802/1/documents/EGU2018-3759.pdf (33 KB)
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Large scale simulation experiments for the assessment of one-step ahead forecasting properties of stochastic and machine learning point estimation methods, Asia Oceania Geosciences Society (AOGS) 14th Annual Meeting, Singapore, HS06-A002, doi:10.13140/RG.2.2.33273.77923, Asia Oceania Geosciences Society, 2017.
The research in geophysical sciences often focuses on the comparison between stochastic and machine learning (ML) point estimation methods for time series forecasting. The comparisons performed are usually based on case studies. The present study aims to provide generalized results regarding the one-step ahead forecasting properties of several popular forecasting methods. This problem cannot be examined analytically, mainly because of the nature of the ML methods. Therefore, we conduct large-scale computational experiments based on simulations. Regarding the methodology, we compare a total of 20 methods among which 9 ML methods. Three of the latter methods are build using a neural networks algorithm, other three using a random forests algorithm and the remaining three using a support vector machines algorithm. The stochastic methods include simple methods, models from the frequently used families of Autoregressive Moving Average (ARMA), Autoregressive Fractionally Integrated Moving Average (ARFIMA) and Exponential Smoothing models. We perform 12 simulation experiments, each of them using 2 000 simulated time series. The time series are simulated using a stochastic model from the families of ARMA and ARFIMA models. The comparative assessment of the methods is based on the error and the absolute error of the forecast of the last observation.
Full text: http://www.itia.ntua.gr/en/getfile/1719/1/documents/AOGS-HS06-A002presentation.pdf (4029 KB)
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, A set of metrics for the effective evaluation of point forecasting methods used for hydrological tasks, Asia Oceania Geosciences Society (AOGS) 14th Annual Meeting, Singapore, HS01-A001, doi:10.13140/RG.2.2.19852.00641, Asia Oceania Geosciences Society, 2017.
The selection of metrics for the evaluation of point forecasting methods can be challenging even for very experienced hydrologists. We conduct a large-scale computational experiment based on simulations to compare the information that 18 metrics proposed in the literature give about the forecasting performance. Our purpose is to provide generalized results; thus we use 2 000 simulated Autoregressive Fractionally Integrated Moving Average time series. We apply several forecasting methods and we compute the values of the metrics for each forecasting experiment. Subsequently, we measure the correlation between the values of each pair of metrics, separately for each forecasting method. Furthermore, we explore graphically the detected relationships. Finally, we propose a set of metrics that we consider to be suitable for the effective evaluation of point forecasting methods.
Full text:
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Investigation of the effect of the hyperparameter optimization and the time lag selection in time series forecasting using machine learning algorithms, European Geosciences Union General Assembly 2017, Geophysical Research Abstracts, Vol. 19, Vienna, 19, EGU2017-3072-1, doi:10.13140/RG.2.2.20560.92165/1, European Geosciences Union, 2017.
The hyperparameter optimization and the time lag selection are considered to be of great importance in time series forecasting using machine learning (ML) algorithms. To investigate their effect on the ML forecasting performance we conduct several large-scale simulation experiments. Within each of the latter we compare 12 methods on 2 000 simulated time series from the family of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. The methods are defined by the set {ML algorithm, hyperparameter selection procedure, time lags}. We compare three ML algorithms, i.e. Neural Networks (NN), Random Forests (RF) and Support Vector Machines (SVM), two procedures for hyperparameter selection i.e. predefined hyperparameters or defined after optimization and two regression matrices (using time lag 1 or 1, …, 21). After splitting each simulated time series into a fitting and a testing set, we fit the models to the former set and compare their performance on the latter one. We quantify the methods’ performance using several metrics proposed in the literature and benchmark methods. Furthermore, we conduct a sensitivity analysis on the length of the fitting set to examine how it affects the robustness of our results. The findings indicate that the hyperparameter optimization mostly has a small effect on the forecasting performance. This is particularly important, because the hyperparameter optimization is computationally intensive. On the other hand, the time lag selection seems to mostly significantly affect the methods performance when using the NN algorithm, while we observe a similar behaviour for the RF algorithm albeit to a smaller extent.
Full text: http://www.itia.ntua.gr/en/getfile/1693/1/documents/EGU2017-3072presentation.pdf (1731 KB)
Additional material:
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Multi-step ahead streamflow forecasting for the operation of hydropower reservoirs, European Geosciences Union General Assembly 2017, Geophysical Research Abstracts, Vol. 19, Vienna, 19, EGU2017-3069, doi:10.13140/RG.2.2.27271.80801, European Geosciences Union, 2017.
Multi-step ahead forecasting is of practical interest for the operation of hydropower reservoirs.We conduct several large scale simulation experiments using both streamflow data and simulated time series to provide generalized results concerning the variation over time of the error values in multi-step ahead forecasting. In more detail, we apply several popular forecasting methods to each time series as explained subsequently. Each time series is split into a fitting and a testing set. We fit the models to the former set and we test their forecasting performance in the latter set. Lastly, we compute the error and the absolute error at each time step of the forecast horizon for each test and carry out a statistical analysis on the formed data sets. Furthermore, we perform a sensitivity analysis on the length of the fitting set to examine how it affects the results.
Full text: http://www.itia.ntua.gr/en/getfile/1692/1/documents/EGU2017-3069presentation.pdf (3930 KB)
Additional material:
G. Papacharalampous, H. Tyralis, and D. Koutsoyiannis, Comparison between stochastic and machine learning methods for hydrological multi-step ahead forecasting: All forecasts are wrong!, European Geosciences Union General Assembly 2017, Geophysical Research Abstracts, Vol. 19, Vienna, 19, EGU2017-3068-2, doi:10.13140/RG.2.2.17205.47848, European Geosciences Union, 2017.
Machine learning (ML) is considered to be a promising approach to hydrological processes forecasting. We conduct a comparison between several stochastic and ML point estimation methods by performing large-scale computational experiments based on simulations. The purpose is to provide generalized results, while the respective comparisons in the literature are usually based on case studies. The stochastic methods used include simple methods, models from the frequently used families of Autoregressive Moving Average (ARMA), Autoregressive Fractionally Integrated Moving Average (ARFIMA) and Exponential Smoothing models. The ML methods used are Random Forests (RF), Support Vector Machines (SVM) and Neural Networks (NN). The comparison refers to the multi-step ahead forecasting properties of the methods. A total of 20 methods are used, among which 9 are the ML methods. 12 simulation experiments are performed, while each of them uses 2 000 simulated time series of 310 observations. The time series are simulated using stochastic processes from the families of ARMA and ARFIMA models. Each time series is split into a fitting (first 300 observations) and a testing set (last 10 observations). The comparative assessment of the methods is based on 18 metrics, that quantify the methods’ performance according to several criteria related to the accurate forecasting of the testing set, the capturing of its variation and the correlation between the testing and forecasted values. The most important outcome of this study is that there is not a uniformly better or worse method. However, there are methods that are regularly better or worse than others with respect to specific metrics. It appears that, although a general ranking of the methods is not possible, their classification based on their similar or contrasting performance in the various metrics is possible to some extent. Another important conclusion is that more sophisticated methods do not necessarily provide better forecasts compared to simpler methods. It is pointed out that the ML methods do not differ dramatically from the stochastic methods, while it is interesting that the NN, RF and SVM algorithms used in this study offer potentially very good performance in terms of accuracy. It should be noted that, although this study focuses on hydrological processes, the results are of general scientific interest. Another important point in this study is the use of several methods and metrics. Using fewer methods and fewer metrics would have led to a very different overall picture, particularly if those fewer metrics corresponded to fewer criteria. For this reason, we consider that the proposed methodology is appropriate for the evaluation of forecasting methods.
Full text: http://www.itia.ntua.gr/en/getfile/1691/1/documents/EGU2017-3068presentation.pdf (1804 KB)
Additional material:
P. Dimitriadis, M. Liveri-Dalaveri, A. Kaldis, C. Kotsalos, G. Papacharalampous, and P. Papanicolaou, Zone of flow establishment in turbulent jets, European Geosciences Union General Assembly 2012, Geophysical Research Abstracts, Vol. 14, Vienna, EGU2012-12716, European Geosciences Union, 2012.
It is well established experimentally that as the Reynolds number increases the core of the jet diminishes and has smaller effects on the jet’s mean profiles (e.g. concentration, temperature, velocity). The scope of this project is to examine this relationship based on dimensional analysis and experimental data. For that, spatio-temporal temperature records are obtained on the plane of symmetry of heated vertical round jets (for a laboratory turbulent scale at the order of mm) using tracer concentration measurements via a planar laser induced fluorescence technique (PLIF). The investigation area is set close to the nozzle of the jets (5-6 diameters away), at the zone of flow establishment (ZFE), so as to determine the geometric characteristics (dimensions and shape) of the core as a function of the initial velocity and nozzle diameter. The ZFE is estimated through the absence of turbulent intensity fluctuations (assuming a 1% of the maximum intensity as a threshold value).
Full text:
G. Papacharalampous, Theoretical and empirical comparison of stochastic and machine learning methods for hydrological processes forecasting, Postgraduate Thesis, 372 pages, Department of Water Resources and Environmental Engineering – National Technical University of Athens, Athens, October 2016.
Forecasting the future behaviour of hydrological processes is useful in the design and operation of hydraulic engineering works. While the attention given to probabilistic forecasting is growing, there is still large practical and scientific interest in point estimation. It is also a fact that machine learning methods have established themselves as a promising approach to hydrological forecasting and, as a result, research within the field of hydrology often focuses on comparing machine learning methods to classical stochastic methods. The comparisons performed in the literature are usually based on case studies. This thesis conducts a theoretical comparison on the forecasting performance between several classical stochastic and machine learning point estimation methods by performing large-scale computational experiments based on simulations. The purpose of the thesis is to provide generalized results. The theoretical comparison is accompanied by a small-scale empirical comparison to highlight important points. Emphasis is placed on Support Vector Machines (SVM), that consist the most popular new entrant machine learning category in the field of hydrology, while the well-established Neural Networks (NN) are also involved in the comparison. The comparison refers to long-term forecasting on the observation time scale, although short-term forecasting is also useful. As regards the methodology, a total of 28 methods are used, among which 9 are machine learning methods. Six of the latter methods are built using a SVM algorithm and the remaining three using a NN algorithm. 20 simulation experiments are performed, while each of them uses 2 000 simulated time series. The time series are simulated using a stochastic model from the frequently used families of models Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Autoregressive Fractionally Integrated Moving Average (ARFIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA). Additionally, 8 computational experiments are carried out, each using one historical time series. Each time series is divided into two parts. The first part is used for training the model and the second for testing its forecast. The comparative assessment of the methods is based on 22 metrics, that quantify the methods’ performance according to several criteria. These criteria are related to the bias with respect to the mean and standard deviation, the accuracy and the correlation. The most important outcome of this thesis is that in general there is not a uniformly better or worse method. However, there are methods that are regularly better or worse than others according to specific metrics. It appears that, although a general ranking of the methods is not possible, their classification based on their similar or contrasting performance in the various metrics is possible to some extent. Another important conclusion is that more sophisticated methods do not necessarily provide better forecasts compared to simpler methods. It is pointed out that machine learning methods do not differ dramatically from classical stochastic methods, while it is interesting that the SVM and NN algorithms used in this thesis offer potentially very good performance in terms of accuracy, compared to the overall picture. It should be noted that, although the present thesis focuses on hydrological processes, the results are of general scientific interest and they also concern all possible observation time scales. In addition to the use of simulated processes, another important point in the present thesis is the use of several methods and metrics. Using fewer methods and fewer metrics would have led to a very different overall picture, particularly if those fewer metrics corresponded to fewer criteria. For this specific reason, the proposed methodology of the thesis is considered to be more appropriate for the evaluation of forecasting methods.
Full text: http://www.itia.ntua.gr/en/getfile/1670/1/documents/papacharalampous.pdf (26770 KB)
Additional material: