Water-energy consumption forecasting using advanced machine learning models

C Michalopoulos, Water-energy consumption forecasting using advanced machine learning models, Diploma thesis, 106 pages, Department of Water Resources and Environmental Engineering – National Technical University of Athens, March 2022.

The Athens Water Supply and Sewerage Company (EYDAP), is the largest active company in Greece in the water market. EYDAP's clientele in the field of water supply includes about 4,400,000 customers (2,160,000 connections), while the length of the pipelines counts to 14,000 km. The big water crisis that struck Athens in the nineties highlighted the importance of drinking water management. A big chapter of drinking water management is the reduction of losses in the transport process, but also the typological determination of the leak. The most basic methodology for calculating losses in a closed network is the water balance method. The method exploits the principle of continuity of fluids to find the amount of lost water, that is, what enters the network should come out (in terms of total water volume). The difference between the volume of water entering the network and the volume leaving the network equals the total losses. However, a problem that can be easily solved at small scale, does not mean that will be that easy on a larger scale. Balancing the solution requires measurements from consumer consumption. Nevertheless, due to the large size of the problem, EYDAP is not able to measure all the installed flow meters at the quarter-year scale. From the grand total of about 2,200,000 flowmeters that must be recorded, EYDAP is not able to measure about 200,000 of them. Ηence, within a year approximately 800,000 measurements are not performed. These incomplete recordings contribute to the overall estimation of network losses. The aim of this work is to better estimate the volume of water the users have consumed to better estimate the water balance for calculating and locating losses. These results can of course have another use for the company, namely a more accurate pricing for customers whose consumption has not been recorded (imputed invoice). For the needs of the problem the data and calculations will be performed in Python computing environment. Due to the complexity of the problem, the use of statistical models is chosen, in combination with Machine Learning for regression models, such as ARIMA, Seasonal ARIMA, Neural Networks, Random Forest. New methodologies have also been developed which take advantage of the format of the data format. That is, forecasts based on the consumers that have been measured are made utilizing such methodologies. The latter are based on clustering algorithms such as kNN (k Nearest Neighbors) and Gaussian mixture models. The models are tested on synthetic data access to the real data was not available till the thesis completion. For reasons of completeness, it was deemed necessary to test the models on real data. In recent years, technology has expanded greatly in the field of energy management, with the result that such data are easily accessible to the scientific community. For this reason, electricity consumption data were selected from ten American states, retrieved from publicly available online sources.