Correction of Precipitation Data in Weather Files for the Subtropical Climate in Southern Brazil

Abstract The thermal performance of buildings can be evaluated prior to its construction by modeling it using a specialized software. Climate boundary conditions must be represented by a weather file that is composed of a weather dataset organized hourly according to a defined year structure such as Test Reference Year (TRY) or Typical Meteorological Year (TMY). Before this study, there were a few weather files available for cities in southern Brazil, notably, Santa Maria, RS, classified as a humid subtropical climate. The most recent available weather file was built in 2014 and presents inconsistencies with respect to precipitation data. Therefore, the objetive of this study was to process and analyze the climate data of Santa Maria over an eighteen-year period (2002-2020), to generate a more reliable weather file. The applied method considered the following procedures: data collection and processing; TRY (TRY17) and TMY2 (TMY0220) definition; solar radiation data calculation; EPW files generation; and comparison between the new EPW files and the previous existing files. As a result, in a short period of time (2014-2020), significant differences among the weather files were observed. The importance of updating weather files in time intervals shorter than 30 years was emphasized. In relation to the comparative analysis, both weather files (TRY17 and TMY0220) presented dry bulb temperatures in consonance with the other files previously available. Although, the correction of precipitation data could originate building simulations closer to the reality.


INTRODUCTION
Climatic conditions play a key role in building design.Taking these into account allows for the projection of buildings that are more comfortable and energetically efficient.Therefore, weather data is an important input for hygrothermal simulation tools, which require hourly data on climatic conditions, such as temperature, relative humidity, solar radiation, wind, precipitation, and atmospheric pressure (BARREIRA et al., 2017).Despite the growing number of weather stations, many cities lack long-term climate data that is compatible with simulation software.In Brazil, climate archives that can be used to evaluate building thermal performance are primarily available in capital cities, and not all of them are in a TRY or TMY format (LABEEE, 2022).Generating weather files involves the use of preprocessing, correction, and interpolation techniques to correct errors and fill in missing data (BARNABY;CRAWLEY, 2011;TAYLOR et al., 2014;SANTOS, 2019;EVOLA et al., 2021).
In addition, according to Pyrgou et al. (2017) and Evola et al. (2021), historical weather data should be representative of the current climatic conditions.Then continuous updating process is important to reach this objective, and the adequacy of weather files depends on specific studies and analysis carried out in different contexts.
A typical weather file, for a one-year data period, consists of 8,670 hourly data values regarding meteorological parameters from longterm data (BILBAO et al., 2004;FAGBENLE, 1995;LUPATO;MANZAN, 2018), usually 30year historical averages (WMO, 2022).When the hourly 30-year data are not available, as the case of Santa Maria, short periods are used (HUI, 1996).Weather reference year files are composed of data corresponding to twelve months, with the aim to establish through simulations the energy consumption in buildings and the users' thermal comfort (PISSIMANIS et al., 1988;MARION;URBAN, 1995), among other goals.
There are several types of reference year.These files can be expressed in various formats and include a range of parameters to meet diverse demands and requirements (AL-MOFEEZ, 2012;BARREIRA et al., 2017).The Test Reference Year (TRY) and the Typical Meteorological Year (TMY) are the most common weather files used in building thermal performance simulations (PERNIGOTTO et al., 2017).TRY and TMY are obtained from weather recordings statistically identified to compose a climate-representative year (KALAMEES; KURNITSKI, 2006;JANJAI;DEEYAI, 2009;LEEet. al., 2010;LUPATO;MANZAN, 2018).Each file is composed of 8,760 data points for each climatic parameter, such as dry bulb temperature, wet bulb temperature, wind speed, precipitation, solar radiation, among others (CARLO; LAMBERTS, 2005).Several methods have been developed for determining TRY (PERNIGOTTO et al., 2017;HALL et al., 1978) and TMY (HUI, 1996;LUND, 1991;SKEIKER, 2007) for different purposes.The ISO 15927-4 (2005) presents a method to determine European TRY based on a real specific year that characterizes the climate in a region considering a long period of time and has been used by many authors (LEE; YOO; LEVERMORE, 2010; DU; UNDERWOOD; EDGE, 2012;SORRENTINO et al., 2013;PERNIGOTTO et al., 2017;BARREIRA et al., 2017;KIM et al., 2017;LEITZKE et al., 2018).TMY was developed by the National Climatic Data Center (NCDC) and the Sandia National Laboratory (SNL), and currently, is one of the most accepted and used methods for developing weather reference files (BRE;FACHINOTTI, 2016;LI et al., 2020;EVOLA et al., 2021).TMY2, an improvement on TMY, consists of 12 Typical Meteorological Months (TMM) selected from months of a multiyear weather database (KALAMEES; KURNITSKI, 2006;ZANONI, 2015;GUIMARÃES, 2016) of at least 10 years (HUI, 1996).The TMY2 method is applied through a monthly analysis of a series of data to compose a reference climatic year.Thus, the most representative months of different years are selected to compose the typical year, considering a statistical analysis.The procedure is performed for the twelve months of the year, resulting in the TMY2 file consisting of 12 real months, which may or may not be from different years (GUIMARÃES, 2016).
There are other reference years used for specific conditions and contexts, which were generated for different purposes.For example, the Weather Year for Energy Calculations (WYEC) was developed specifically for building energy simulation (LUND, 1991;CRAWLEY, 1998;ARGIRIOU et al., 1999;AL-MOFEEZ et al., 2012;DETOMMASO et al., 2021;LIU et al., 2021).The Moisture Reference Year (MRY) is used in hygrothermal simulations, considering the critical moisture load in the building components in order to prevent the degradation by moisture (KALAMEES; VINHA, 2004).
In Brazil, the TMY method was adapted, starting from the monthly analysis of a series of data for the composition of a reference climatic year.Thus, the application of the TMY methodology selects the most representative months of different years to compose the typical year.From the analysis of the average monthly dry bulb temperatures, the successive exclusion of the months with the highest and lowest values is carried out until only one remains, called the typical month.The procedure is performed for the twelve months of the year, resulting in the TMY file consisting of 12 real months, which may or may not be from different years (GUIMARÃES, 2016;ZANONI, 2015).Leitzke et al. (2018) processed a weather database in order to define TRY for Pelotas, a city in southern Brazil.The method used by mentioned authors was similar to the method used in this study.
After analyzing the existing weather files (GRIGOLETTIet al. , 2016;LABEEE, 2022) and comparing them with data from Santa Maria's climate normal, inconsistencies regarding the underestimated precipitation data were noticed.Considering this weakness, the aim of this work was to update the weather files from Santa Maria, Brazil, located in a humid subtropical climate, to compare their similarities in terms of computational simulation, and to find out what are the implications brought by using weather files with missing precipitation data.The data from 2002 to 2020 were used for the weather files generation.Although the TRY may not be as appropriate for hygrothermal simulations as, for example, the MRY, it was decided, on a first stage, to perform this study in a more commonly used methodology, such as the TMY2 and TRY.

Description of Santa Maria climate
Santa Maria, in southern Brazil, has an altitude of about 113 m.a.s.l., located at 29°41'S, 53°48'W.The climate is Humid Subtropical (Cfa), with an average temperature of 19.3 ºC (Figure 1).During hot humid summers, temperature frequently is higher than 30 °C, and it is common that the maximum daily air temperature reaches 40 ºC.During winters, the temperatures frequently drop down to negative values, around -3 ºC over the night with freezing as a current phenomenon occurring in the early hours (KÖPPEN;GEIGER, 1928;LÖBLER et al., 2015).Rainfall in Rio Grande do Sul can be considered evenly distributed throughout the year, since there is no defined rainy season and the annual accumulations range from 1,200 mm to more than 1,900 mm (GRIMM et al., 1998;TEIXEIRA, 2010;ROSSATO, 2011).Heavier rainfall events are common during spring and summer, although, in winter, frontal rainfall events prevail (INMET, 2020).

Definition of TRY and TMY2 weather files
To find the TRY, organizing and treating the monthly average dry-bulb temperatures of a complete 18-year period, from January 2002 to December 2020, was necessary.The TRY method has the following procedures: (1) calculation of monthly average dry-bulb temperatures of the entire weather data; (2) gradual elimination of years of data that contain extreme monthly average temperatures (high or low), until only one year remains, which will be the test reference year (NCDC, 1976).The months are disposed in order of temperatures, beginning with the warmest month within the whole dataset, following the coldest month, and so on until 12 months are listed.After that, the analysis is repeated, but noting the lowest temperature in the first warmest month, then, the warmest temperature in the first coldest month.The analysis is carried out until the 24 months are listed in order of importance.Finally, the years with extreme monthly average temperatures (high or low) are eliminated, until only one year is left, which is the TRY.Therefore, the TRY is a reference year composed of a real year.
For TMY2, the analyses are performed for each month independently, so the selected months can be from different years, and subsequently, an unreal year is set up.In general, the procedure consists of calculating the monthly average temperatures of the entire weather data, month by month independently, and gradually eliminating those with higher and lower monthly average temperatures, until 12 months were left, which composed the typical meteorological year (CARLO; LAMBERTS, 2005).

Data gathering
The weather data from Santa Maria needed for the Test Reference Year (TRY) and the Typical Meteorological Year (TMY2) generation was acquired from the Meteorological Database of the Instituto Nacional de Meteorologia (INMET, 2020), which represents Brazil before the World Meteorological Organization and is responsible for Brazilian meteorological information.Hourly data from January 2002 to December 2020 include dry bulb temperature (°C), dew point temperature (°C), global solar radiation (Wh/m²), relative humidity (%), atmospheric pressure (Pa), wind speed (m/s), wind direction (°), and precipitation (mm).In total 157,680 hours of raw weather data were analyzed and treated.

Fill in missing data
Due to equipment failures, data recording of some hours, or even days were null, leaving gaps in the weather dataset.In order to correct these missing data and fill the gaps, the methodology by Guimarães (2016) was used.The filling procedure varies according to the number of consecutive failures.In gaps with up to 6 missing values, simple interpolation was performed.For gaps with more than 6 consecutive values and less than 24, the closest 10 hours (5 hours before and 5 hours after the gap) were identified for the 3 days before and 3 days after the gap.Then, the total and average difference of the data was calculated, in addition to the standard deviation and variance.The day with the smallest difference from the day of the gap was used to fill the gap.Finally, for gaps with more than 24 consecutive values, the corresponding month was removed since the data treatment was considered inefficient due to the amount of uncertainties (GUIMARÃES, 2016).

Data control and consistency analysis
To verify signs of significant inaccuracy of the data and to perform data quality control, the following tests were performed (PITTIGLIANI, 2000): range test (evaluation of the minimum and maximum limits of each climatic parameter) (Table 1), step test (evaluation of the maximum differences between two consecutive data for each weather parameter) (Table 2), and persistence test (evaluation of standard deviation and variation in the 24-hour period) (Table 3).Then, the data of TRY and TMY2 were compiled in a CSV (comma-separated value) format.The data required to set up the file are date, time, source, dry bulb temperature (ºC), dew point temperature (ºC), relative humidity (%), atmospheric pressure (Pa), horizontal extraterrestrial radiation (Wh/m 2 ), horizontal global radiation (Wh/m 2 ), normal direct radiation (Wh/m 2 ), diffuse horizontal radiation (Wh/m 2 ), wind direction (degrees), wind speed (m/s) and precipitation (mm) (LEITZKE et al., 2018).
The CSV file was generated through the DeEPWaCSV converter (https://ecoeficiente.es/conversor-epw-a-csv/).After that, the CSV is converted into an EPW file through the Weather Statistics and Conversions (ENERGYPLUS, 2022).Statistical significance was used to verify the discrepancy between each weather file (ABNT, 2005).Psychrometric charts for both weather files, TRY and TMY2 were created and compared using Climate Consultant 6.0 (SBSE, 2018).

Building simulation model
A comparative analysis among the previously existing weather files and the ones developed in this study was carried out in order to verify the differences and similarities between them considering annual energy consumption and operative temperatures.
The The infiltration rate was set to 0.5 air change/hour, the internal loads, 200 W continuous, 60% radiative, 40% convective and 100% sensible.The mechanical system was 100% convective air system, 100% efficient with no duct losses and no capacity limitation, no latent heat extraction and a non-proportionaltype dual setpoint thermostat with deadband.The heating and cooling were set up between 20 ºC and 27 ºC.The simulations were performed hourly during the whole climatic year.

FINDINGS TRY and TMY2 generation
Table 5 presents the monthly average dry-bulb temperatures from 2002 to 2020.The months with more than 24 hours of null data were discarded.Table 6 presents the classification according to extreme temperatures, that were rejected.This way, 2017 was designated as the TRY and named as TRY17 (Table 7).Source: elaborated by authors with data from INMET (2020).
The TMY2 consists of the remaining 12 months from the elimination of individual months with extreme maximum and minimum dry-bulb temperatures (Table 8).2020 2009 2009 2020 2010 2010 2013 2004 2010 2013 2005 2007 Source: elaborated by authors with data from INMET (2020).

Weather files comparison
A statistical analysis of the climatic parameters of each weather file was performed (Figure 2) considering average, standard deviation, skewness coefficient, minimum and maximum values and variation coefficient.
The temperature is one of the key factors influencing the user's thermal perception and building energy performance (MAMANI et al., 2022).The yearly descriptive analysis of the outside dry-bulb temperature of the weather files shows almost symmetric distribution with some outliers (extreme temperatures during the summer and the winter) (Figure 2a).There are strong similarities among the yearly outside dry-bulb temperature distribution and the average values vary between 18.40 ºC and 19.79 ºC, with a standard deviation smaller than 7.05 ºC.This result indicates that the weather files developed in this study (TRY17 and TMY0220) are consonant with previous existing weather files, regarding outside dry-bulb temperature.
The distribution of relative humidity is slightly asymmetric with some outliers in TMY0517 and TRY03.The average values vary between 73.86% and 82.60%, with a maximum standard deviation of 20.44% for TMY0220 and TRY17.The distributions of wind speed and rainfall intensity presented high positive asymmetry with several outliers, while wind direction is almost symmetric, but with high variability.The boxplots of global solar radiation distributions show significant similarities, except for TRY17.This weather file is composed of the 2017 year, this particular year presented higher solar radiation values, which can be noted by the higher average and maximum values than the other weather files, reaching up to 1,399 W/m 2 (Table 9 and Figure 3).The TMY0517 presented the lower values of global solar radiation between the files evaluated (reaching up to 999 W/m 2 ).The solar radiation during the summer day was higher than during the winter day, as expected, and the weather files presented the same behavior during both days, increasing the solar radiation from TMY0517 to TRY17.Considering the precipitation data of existing weather files, Figure 2f shows the null precipitation data in TMY0418 and TMY0517, and the lower values in TMY7317.The total annual precipitation was 0.0 mm for TMY0418 and TMY0517, which clearly indicates that both these weather files did not consider the rainfall data (Figure 4).The total annual precipitation for TMY7317 was 222.0 mm and for TMY7818, 14.0 mm, and however the TRY03 presented the higher precipitation data from previously existing files (890.0 mm), it did not reach up the annual accumulation range from 1,200 mm to more than 1,900 mm (GRIMMet al., 1998;TEIXEIRA, 2010;ROSSATO, 2011).Contrariwise, TMY0220 and TRY17 showed an annual precipitation of 1,750.6 mm and 2,117.3,respectively, in accordance with the climate normals for Santa Maria.

Psychrometric charts
In Figure 5, similarities between both psychrometric charts can be seen.The comfort zone areas are equivalent, but the distribution of hourly data points differed.In TMY0220, data points are spread in warm humid, and warm dry zones, while in TRY17, the data points are concentrated in humid zones.Both weather files indicated that Santa Maria showed a high level of discomfort, almost 50% of annual hours for each weather file, especially during cold weather.During the winter season, bioclimatic strategies, such as internal gains and passive solar gains can diminish the discomfort hours by up to 43.5% of annual hours for TRY17 and 42.6% for TMY0220.The discomfort by heat was lower, but still significant, 30.8% of annual hours for TRY and 31.4% for TMY0220.These levels of discomfort can be reduced up to 20% and 23.2% of annual hours (TRY17 and TMY0220, respectively) by using natural ventilation, evaporative cooling and mass cooling and night ventilation (Table 10).There is a tendency that TMY0220 presents higher percentages, which overturns artificial strategies such as air-conditioning, dehumidification, and heating.

Case 600 ASHRAE
Heating loads, cooling loads and annual energy consumption were simulated for seven weather files (Table 11).TMY7818 showed the highest energy consumption, for both heating and cooling, and consequently, for the whole year.Although TMY0517 showed the lowest heating demand (963.6 kWh/year), TRY03 resulted in the lowest cooling consumption (518.2 kWh/year) and the lowest total annual energy consumption (1,583.9kWh/year).The highest percentage difference in total annual consumption occurred between TMY7818 and TRY03 (45.3%), and the lowest was between TMY0220 and TMY0517 (0.52%).Also, there is a similarity between TMY0220 and TRY17, the most updated files considering the dataset.
TMY0517 presented the highest monthly average operative temperatures, showing that this weather file can possibly overestimate the temperatures.Regarding the lowest monthly average temperatures, TRY17 showed the lowest values in the hottest months (January to March) and during the coldest month (July) (Table 12).TMY0220 presented the lowest average operative temperature for April, May, and June.

CONCLUSION
This work aimed to update the weather file of the city of Santa Maria -Brazil, developing the TRY, TMY2, and EPW files based on the dataset from 2002 to 2020.Data processing procedures were described, such as: checking inconsistencies, filling gaps, and calculating missing climate parameters.
When comparing previous weather files with the two developed in this work, the average daily temperatures are very similar between them, although Typical Meteorological Year and Test Reference Year differed.Regarding TRY file, 2003 was the representative year in the previous analysis, which changed to 2017 in the new generation.However, this file resulted in higher average operative temperatures, as well as global solar radiation and precipitation data, which could be justified by climate change in the recent years.Significant variance and its impact on energy consumption, as well as on the weather parameters has been shown.
One of the limitations of this work is the available data.The ideal condition for the development of weather files is the use of 30 years of measured data, with the recommended minimum being 10 years.This study was carried out based on a series of 18 years of measured data available for Santa Maria.Problems regarding gaps and null values in the monitored data were also a limitation of this work.
This study contributed to generate updated files for building thermal simulation, which is crucial to predict its energy performance, mainly considering precipitation data, a lack of existing files.

Figure 1 -
Figure 1 -Santa Maria climate normal for temperature and precipitation from 1981 to 2010.

Figure 2 -
Figure 2 -Boxplots of the climatic parameters by weather file.a) temperature; b) relative humidity; c) global solar radiation; d) wind speed; e) wind direction; f) rainfall.

Figure 3 -
Figure 3 -Global solar radiation for a) summer day and b) winter day, for each weather file.

Figure 4 -
Figure 4 -Cumulative rainfall data for each weather file along the year.

Table 1 -
Range test parameters.

Table 4 -
(KIM et al., 2017)the simulations was the Case 600 from the ASHRAE Standard 140 (ANSI/ASHRAE, 2014).The basic test building is a rectangular single zone (8 m wide long x 6 m long x 2.7 m high), and 12 m 2 of windows on the south exposure.Materials characteristics are presented in Table4.This basic building model can provide the influence of different weather parameters on the building thermal performance, as well as energy demand(KIM et al., 2017).Thermal characteristics of the building materials.

Table 6 -
Month classification in order of importance for TRY composition

Table 8 -
Years for TMY2 composition

Table 9 -
Descriptive statistics of temperature, relative humidity, global solar radiation, wind speed, wind direction and rainfall intensity.

Table 10 -
Comfort and Discomfort indexes for TMY0220 and TRY17 weather files.

Table 11 -
Annual energy consumption as a function of weather data.