Data analysis reveals that extreme events have increased the flood inundations in the Taquari River Valley, Southern Brazil

Flood inundations represented 43.5% of the world’s deaths caused by natural disasters in 2019 and have caused uncountable damages in cities and human settlements in the last years, especially in Brazil. The dataset comprises the records of the Encantado ́s pluviometric station, a municipality located beside the margin of the Taquari River in southern Brazil, which comprises the rainfall time series (n = 36,466) over 78 years, from April 1943 to December 2020. Complementary datasets also include the annual volume of precipitation per year and the level reached by the Taquari River during 44 flood inundations since 1941. The number of events is subsampled because only 32 years have the complete record of the river level. Three of the five major flood inundations at Encantado occurred after 2001, and the more severe flood recorded the maximum level of the Taquari River (20,27 meters) on July 8, 2020. Thirty-four percent of all flood inundations in the city were recorded between 2011 and 2020. The months of July to October record 70% of all the events, but there is no record of floods in February and December throughout the data series. The human occupation of the floodplain has been fast in the last decades, and most of the urban area has a potential risk of being affected by flood inundations. Moreover, extreme rainfall events and flood events have been more frequent in the last 30 years. Therefore, this database can contribute as a starting point for developing predictive models and verifying a possible correlation of floods with extreme events and global climatic changes.


DATA IMPORTANCE
• Fluvial inundations are natural processes that historically affect populations living near rivers or other water bodies. However, people migrating to the cities promote inordinate urbanization and potentialize the risk of natural disasters. In 2019, 43.5% of the deaths caused by natural disasters in the world were a consequence of flood inundations (EMDAT, 2021). Flood events at Encantado increased in the last 30 years and more studies are necessary to unveil if these events result of an atypical rainy climatic cycle or if they are a consequence of the global climatic changes expressed as extreme rainfalls; • The long temporal data series record is fundamental to understanding the pattern of rain periods through time and identifying the main controls of the flood inundations in the studied area. This database is a starting point for developing classification and prediction models that will help authorities and the Civil Defense empower citizens to deal with emergencies during natural disasters; • We know that floods are not dependent on the precipitation that occurred only in the site of interest. Many different attributes of the entire hydrographic basin control these events, as presented by Kurek (2016). Therefore, we intended to analyze a long pluviometric time series and extract useful information to better understand these events; • The territorial management and urban planning should include temporal data series analysis to plan urban expansion, reduce natural disasters, and become cities more resilient. The ONU´s Agenda 2030 has 17 Sustainable Development Goals (SDG), including risk reduction and mitigation of natural disasters as part of SDG 11, which intends to make cities and human settlements inclusive, safe, affordable, resilient, and sustainable.

MATERIALS AND METHODS
Encantado is a municipality of 22,000 inhabitants situated in the center-east of Rio Grande do Sul state, south of Brazil (29°14'S, 51°52'W). It is part of the Taquari-Antas Hydrographic Basin (TAHB), a region with 119 municipalities distributed in 39.346 km 2 and with many industrial parks and growing urbanization in the last decades (OLIVEIRA et al., 2018;BRUSKI et al., 2020). The municipality is located in the upper domain of the Baixo Taquari-Antas Valley (Fig. 1), fed by the drainage network that flow from the high-altitude terrains of the TAHB (Alto Taquari-Antas, Médio Taquari-Antas, Rio Turvo, Rio Carreiro and Rio Guaporé) toward the Guaíba Lake, approximately 150 km far from Encantado (MARCUZZO, 2018;OLIVEIRA et al., 2018).
The raw data came from the Hidroweb system, a tool present in the National Water Resources Information System (SNIRH) and available at https://www.snirh.gov.br/hidroweb/serieshistori cas (ANA, 2001). The Encantado´s pluviometric station is identified with the code 2951010 and had data downloaded as a .csv file. After converting to a .xlsx format, the file was analysed to identify the lack of measurements (missing values) and check if days without precipitation had cells filled correctly with zero. This analysis revealed that the exported spreadsheet had included only months with the data record. Therefore, it had included all the missing months to facilitate the visualization. A detailed inspection was also performed on the last days of each month. Considering that months have 28, 29, 30, or 31 days, the database was updated in terms of cell filling and missing values for months with 28 and 29 days (February), 30 days (April, June, September, November), and 31 days (January, March, May, July, August, October and December). The weeks considered the different number of days following the number of days in each month. Therefore, months with 28, 29, 30, and 31 days considered the subdivisions as follow: • 28 days: week 1 = 01-07, week 2 = 08-14, week 3 = 15-21, week 4 = 22-28 • 29 days: week 1 = 01-07, week 2 = 08-14, week 3 = 15-21, week 4 = 22-29 • 30 days: week 1 = 01-08, week 2 = 09-15, week 3 = 16-23, week 4 = 24-30 • 31 days: week 1 = 01-08, week 2 = 09-15, week 3 = 16-23, week 4 = 24-31 For the annual calculation of the statistic parameters, it was used all data available in the dataset. The year 1961 was not considered for calculations because there is only one month of measurements. The years of 1943, 1952, 1953, 1985, 2006, 2007, 2009, 2010, 2015, 2016, 1017, 2019 and 2020 have missing values in some months. Although they are part of the dataset, we must consider that almost all the statistic values from the annual analysis are underestimated, especially in the winter season.
A Python script named script_Encantado_LADS.ipynb (see Supplementary Material) run three .csv files using the pandas, seaborn, matplotlib, numpy, sklearn, and plotly libraries to convert the spreadsheet into a dataframe, inspect the statistics of each attribute, perform data cleaning, count missing values, calculate new attributes, and generate histograms, scatter plots, and Pearson´s correlation coefficient. One specific step calculated high outliers of precipitation based on the attribute "Montly_total_vol" using the formula: Vmax = Q3 + (1.5*(Q3-Q1), being Q1 and Q3 the first and third quartiles, respectively. The output provided a list of anomalous rainfalls that was compared with the maximum flood inundations along the time series to verify if there is a correlation between monthly precipitation and the flood events at Encantado.

DATA DESCRIPTION
The analysis of the precipitation along daily to annual time series is fundamental to understand the pluviometric behavior through time (Fig. 2). We reinforce that the annual pluviometric records are underestimated, as indicated in the section Material and Methods. Therefore, the values of the mean, standard deviation, and quartiles are approximate. The period between 1943 and 1957 had precipitation below the historical mean. The interval 1943-1990 has 12 years with precipitation below the first quartile (Q1), whereas the interval 1991-2020 has possibly only one year (2004) because all the other years below Q1 are those with missing values into the dataset (2006,2010,2016). The comparison of the last two intervals also reveals that 18 years had precipitation above the mean and median in the interval 1943-1990, whereas the number to the interval 1991-2020 was 20. Considering the third quartile (Q3), the comparison shows 8 years for the interval 1943-1990 and at least 7 years for the interval 1991-2020 with precipitation above Q3. It shows unequivocally that the precipitation in the last 30 years is proportionally higher than the 48 first years of the time series (Fig. 2). Only the years 1966 and 2002 represents high outliers in the time series.
All the attributes related to the rainfall volume had calculated their statistic parameters. The mean daily rainfall is 38.7 mm, and the maximum volume of precipitation into 24h was 135.0 mm. The mean of rainy days is 7.5, and the maximum number of rainy days in a month was 20. The monthly data analysis revealed average precipitation of 116.9 mm, with a maximum of 382 mm. The annual rainfall values are 1359.5 mm (mean) and 2265.7 mm (maximum) ( Table 1). The accumulated precipitation of a fortnight can be 81.5% of the monthly total, whereas one week can present 61.5% of the month´s rainfall. Figure 2. Annual volume of rainfall along the time series 1943-2020. The statistic parameters Q1, mean, median, and Q3 indicate that the last 30 years (1991-2020) had proportionally more volume of precipitation than the first 48 years . Note the * 2020 had a very high volume of precipitation but the data record is not complete. A logic function allowed selecting all the outliers related to the anomalous rainfall into the time series. The value 287.82 mm is the cutoff obtained by calculating the Vmax using the attribute "Montly_total_vol." Twenty-two months had precipitation higher than the cutoff into the 78 years of analysis, but the maximum inundation of July 2020 does not have data available on the Hidroweb system. Seven of these months had flood inundations recorded at Encantado, according to Peixoto & Lamberty (2019) and Bruski et al. (2020) (Table 2).
The number of flood inundations in the intervals of 1943-1990 and 1991-2020 is the same (22). However, the number of events in the last decade (2011-2020) is 66.7% higher than the former maximum (1951)(1952)(1953)(1954)(1955)(1956)(1957)(1958)(1959)(1960) and 375% higher than the decade 2001-2010 (Fig. 3).  A statistical analysis of the attributes related to the 44 flood inundations that occurred at Encantado, considering the high outliers and the entire time series of 1943-2020 (Table 3; Fig. 4), shows that: • Only 14.0% of the flood inundations are directly related to the high outliers (> Vmax) of the attribute Monthly_total_vol (Table 3A), but 76.7% of the floods show a direct relationship with the rain volume of the third quartile (> Q3) (Table 3B). • A mean of 18.6% of the floods is related to high outliers of the attributes Vol_fortnight1 and Vol_fortnight2 (Table 3A), but this value increases to 55.8% if considered the rain volume third quartile (Table 3B).
• 16.3% of the floods show a direct relationship with the high outliers of the attributes Vol_week1 to Vol_week4 (Table 3A); however, 44.2% of the flood events correlate with rain precipitations higher than the third quartile (Table 3B). A statistical analysis of the attributes related to the 44 flood inundations at Encantado, considering the high outliers and subdividing the time series into two intervals, 1943-1990 and 1991-2020, shows that: • The number of floods in the intervals of 1943-1990 and 1991-2020 is the same (3), based on the monthly data (Fig. 4).
• However, if based on the weekly data, the number of flood events after 1991 (17) is 41.6% higher than the events that occurred in the interval 1943-1990 (12) (Fig. 4). A composite Pearson´s chart comparing the time series of 1943-1990 and 1001-2020 points out to a more expressive influence of the monthly, fortnightly, and weekly precipitation in the river level during flood events. The last 30 years present a higher Pearson´s coefficient for the time interval of 1991-2019 if compared with the same attributes for the interval of 1943-1990 (Fig. 5).

Dataset
The Encantado_annual_rainfall.csv is a file with 2 columns, Year and Volume(mm), used to plot the rainfall variation in the period 1943-2020.
The second file is the Encantado_rain_data.csv, the most complete dataset with 933 lines and 48 columns which includes all the data associated with each month from April 1943 to January 2021. The columns record, from left to right, Station Code, Data Type, Date (always indicated as the first day of each month), Month, Measurer, the maximum daily volume in millimeters (Daily_max_vol), the day of maximum volume (Day_max_vol), the number of rainy days into the month (Rain_Days), the monthly total volume of precipitation (Monthly_total_vol), the annual volume of precipitation (Annual_vol) and its respective year (Year)(both always indicated in the December line), the volume of the first and second fortnights (Vol_fortnight1 and Vol_fortnight2, respectively), the weekly volumes (Vol_week1, Vol_week2, Vol_week3, and Vol_week4, respectively), and the daily volumes (Day 01 until Day 28, 29, 30 or 31, respectively) (Table 4).