(Stanford users can avoid this Captcha by logging in.)

  • Send to text email RefWorks EndNote printer

Advances in multivariate statistics and its applications

Digital content, also available at, more options.

  • Find it at other libraries via WorldCat
  • Contributors


Creators/contributors, contents/summary, bibliographic information.

Stanford University

  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Non-Discrimination
  • Accessibility

© Stanford University , Stanford , California 94305 .

Open Works

  • < Previous


Senior Independent Study Theses

Multivariate statistics.

Joshua A. Dailey , The College of Wooster

This thesis provides an introduction to several topics in multivariate statistics. The topics investigated include the multivariate normal distribution, discriminant analysis, and the T^2-test. This thesis yields a reasonable blend of theory and practice. There is sufficient theory introduced to make the topics mathematically interesting as well as a blend of real-world examples in order display ways in which the discussed techniques are applicable to various multivariate data.

Hartman, James


Recommended Citation

Dailey, Joshua A., "Multivariate Statistics" (2012). Senior Independent Study Theses. Paper 936. https://openworks.wooster.edu/independentstudy/936

  • Disciplines

Applied Mathematics

multivariate statistics

Publication Date

Degree granted.

Bachelor of Arts

Document Type

Senior Independent Study Thesis

Since October 03, 2013

© Copyright 2012 Joshua A. Dailey

Advanced Search

  • Notify me via email or RSS
  • Departments
  • All Authors
  • Faculty Authors
  • Special Collections

Author Corner

  • Submit Research

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 December 2020

Water quality assessment based on multivariate statistics and water quality index of a strategic river in the Brazilian Atlantic Forest

  • David de Andrade Costa 1 , 2 ,
  • José Paulo Soares de Azevedo 1 ,
  • Marco Aurélio dos Santos 1 &
  • Rafaela dos Santos Facchetti Vinhaes Assumpção 3  

Scientific Reports volume  10 , Article number:  22038 ( 2020 ) Cite this article

10k Accesses

32 Citations

Metrics details

  • Environmental sciences

Fifty-four water samples were collected between July and December 2019 at nine monitoring stations and fifteen parameters were analysed to provide an updated diagnosis of the Piabanha River water quality. Further, forty years of monitoring were analysed, including government data and previous research projects. A georeferenced database was also built containing water management data. The Water Quality Index from the National Sanitation Foundation (WQI NSF ) was calculated using two datasets and showed an improvement in overall water quality, despite still presenting systematic violations to Brazilian standards. Principal components analysis (PCA) showed the most contributing parameters to water quality and enabled its association with the main pollution sources identified in the geodatabase. PCA showed that sewage discharge is still the main pollution source. The cluster analysis (CA) made possible to recommend the monitoring network optimization, thereby enabling the expansion of the monitoring to other rivers. Finally, the diagnosis provided by this research establishes the first step towards the Framing of water resources according to their intended uses, as established by the Brazilian National Water Resources Policy.

Similar content being viewed by others

thesis on multivariate statistics

Widespread societal and ecological impacts from projected Tibetan Plateau lake expansion

thesis on multivariate statistics

Metal mobilization from thawing permafrost to aquatic ecosystems is driving rusting of Arctic streams

thesis on multivariate statistics

Current and future global water scarcity intensifies when accounting for surface water quality


Aquatic systems have been significantly affected by human activities causing water quality deterioration, decreasing water availability and reducing the carrying capacity of aquatic life 1 , 2 , 3 , 4 . Water quality deterioration still persists in developed countries, while it is a major problem in developing countries in which a substantial amount of sewage is discharged directly into rivers 5 , 6 , 7 , 8 . Moreover, according to UNEP 9 , water pollution has worsened since the 1990s in the majority of rivers in Latin America. The global concern with water availability and its quality has been growing, and it is estimated that the demand for water will increase between 20 and 30% by 2050 10 , 11 . In addition, spatial and temporal variations in the hydrological cycle and their uncertainties related to climate change may worsen this scenario 12 , 13 , 14 , 15 , 16 .

Monitoring water quality in order to assess its spatial and temporal variations is essential for water management and pollution control 17 . On the other hand, monitoring programs generate large data sets that require interpretation techniques 18 . There are a number of methods for water quality assessment, including single-factor, multi-index, fuzzy mathematics, grey system evaluation, artificial neural network, multi-criteria analysis, geographical interpolation and multivariate statistical approach 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 . Among them, the most used are the Water Quality Indexes (WQI) that transform a complex set of data into a single value indicative of water quality 26 , 27 and reflect its suitability for different uses 28 . Multivariate statistics is another widely used approach 29 , 30 , mainly with Principal Components Analysis (PCA) and Cluster Analysis (CA), helping to achieve a better understanding of the spatial and temporal dynamics of water quality.

A comparison of seven methods for assessing water quality indicated WQI as one of the best 20 . The assessment of Poyang Lake 28 , China and the upper Selenga River 31 , Mongolia showed that WQIs are suitable for the assessment of both interannual trends and seasonal variations 28 . Multivariate statistical techniques associated with WQI have been used for numerous water bodies world-wide, including the Nag River 30 , India, the Paraíba do Sul River 32 , Brazil, and the before mentioned Selenga River 31 . CA grouped the monitoring stations according to their similarities, while the PCA highlighted components that were related to its pollution sources 30 , 31 , 32 .

In order to ensure water quantity and quality, the Brazilian National Water Resources Policy 33 has established a management tool called Framework, according to the main intended uses of water. It has also created participatory management committees, the so-called Basin Committees, which, together with its technical agency, are responsible for the Framework establishment. Unfortunately, even after two decades, Brazil has had very few successful experiences on the subject 34 .

Brazil has a gigantic and complex hydrographic network present in many different ecosystems 34 . The Brazilian Atlantic Forest is one of the most biodiverse biomes on the planet 35 , 36 , extending along the Brazilian coast and currently covering only 11.4% of its original territory 37 under constant threats 38 , 39 , 40 . The hydrographic basin of the Paraíba do Sul river is located in this environment, which is the integration axis of the most industrialized Brazilian states, São Paulo, Rio de Janeiro and Minas Gerais, and home to around 6.2 million people 41 . A water transfer system regularly supplies another 9 million people in the metropolitan region of Rio de Janeiro, through the Guandu system. Another water transfer system connects the Paraíba do Sul river to the Cantareira system, complementing with 5 m 3 /s the water supply to over 9 million people in the metropolitan region of São Paulo 41 . These systems went through an intense water scarcity between 2014 and 2016 with severe impacts on water quality and availability 32 .

Our study is focused on the Piabanha River watershed, a strategic sub-basin of the Paraíba do Sul river, combining urban, industrial, rural characteristics, and large preserved fragments of Atlantic Forest 36 , 42 . The Piabanha Basin has been monitored for over 10 years with the Studies in Experimental and Representative Watersheds (EIBEX) project, a partnership between universities and government agencies 42 , 43 , 44 . The State Environmental Agency of Rio de Janeiro (INEA) has been monitoring the basin since 1980. Other studies in the region include the analysis of contamination by pesticides 45 , energy generation 46 and dispersion of pollutants 47 . The Piabanha Basin received international attention in Nature's article on biodiversity 36 . But in addition to forest preservation, can the Piabanha River support biodiversity? How is its water quality today? In this way, the Piabanha Basin Committee defined the Framework as a priority in its management plan (2018–2020) and to accomplish this goal, established water monitoring as a strategic action 48 .

Our study covers 40 years of monitoring, including government data, our research projects and, currently, a monitoring program that is being conducted with funding from the Piabanha Basin Committee. The main objectives were: (1) to carry out an updated diagnosis of water quality using multivariate techniques and WQI; (2) to examine the parameters that most influence water quality, and (3) to identify river stretches with similar water quality. Our study provides an extensive understanding of the Piabanha River and supports its Steering Committee in the application of public policies. This is a pilot project that can be a reference for other Framework programs for improving water quality in Brazil.

We have requested and received from INEA two water user databases of the Piabanha Basin. The first set corresponds to raw data from the National Water Resources Register (CNARH), with all the registrations until December 2017 and with 1549 registered interferences (water abstraction or effluent discharge). The second one is the registration validated by INEA until August 2018 by the Águas do Rio project comprising a total of 669 validated interferences. With these data, it was possible to build a georeferenced base. By so doing, it was possible to list the main effluent discharges by type for each monitoring station.

In the validated database, from the 669 interferences, 84% are water abstractions and 16% are effluent discharges. Water abstraction account for 425 m 3  day −1 with 75% from wells and 25% from rivers. On the other hand, effluent discharges are 89 m 3  day −1 . The largest volume of effluents comes from the sanitation sector with 57% of the total, whereas industries account for 33%, aquaculture with 4% and mining for 3% of discharges.

When comparing the two databases, it is clear that the universe of registered users is much larger than the universe of validated users; in other words, those whose data were made up by the state environmental agency and, therefore, received a license. For example, the validated database has only six interferences related to agriculture, in contrast to 789 interferences awaiting validation. This is a serious obstacle for water resources management in the region, which threatens the sustainability of water resources.

Short time monitoring and water quality index

In order to assess and compare the water quality of the Piabanha River, we calculated the Water Quality Index from the National Sanitation Foundation (WQI NSF ) using two datasets, the first one from 2012 and the last one from 2019 (Tables 1 and Table 2 ). The 2012 results (Fig.  1 A) oscillated between the bad and medium categories, generally with medium quality (50.5 ± 10.3). In 2019 (Fig.  1 B), the results ranged between the medium and good categories, in general with medium quality (61.6 ± 10.8).

figure 1

WQI NSF spatial variation over each station from July to December ( A ) 2012 and ( B ) 2019. WQI NSF seasonal variation over the entire length of the river ( C ) 2012 and ( D ) 2019. The entire dataset can be found online as Supplementary Table S1 and S2 , respectively for 2012 and 2019.

Data sets show significant seasonal behavior (p < 0.05) (Fig.  1 C,D) between the end of the dry period (Jul, Aug, Sep) and the beginning of the rainy period (Oct, Nov, Dec) for the parameters DO, WT, pH, nitrate, phosphate and turbidity, while no significant seasonal difference (p > 0.05) was found for the parameters E. coli , BOD and TDS. The parameters that have most impacted the WQI NSF were coliforms and BOD. Ammonia and total phosphorus do not account to WQI NSF , but their concentration has violated Brazilian legislation and their influence can be better understood by PCA.

Principal components and clusters analysis

The 2019 dataset (n = 48), comprising six monitoring campaigns at the eight monitoring stations along the Piabanha River with 15 parameters analysed, was grouped by the average value of each parameter at each station (n = 8). Pearson’s correlation matrix is presented in Table 3 , most parameters showing a strong correlation (r > 0.5) with a confidence interval greater than 95% (α = 0.05). The KMO measures of sampling adequacy (n = 8) were near to 0.5 and the significance level of test of sphericity was less than 0.001, indicating that the data was fit for PCA and the correlation matrix is not an identity matrix and so the variables are significantly related. The Shapiro test confirmed the data normality (p > 0.01) for all parameters, except for E. coli .

ACP was applied to identify groups of parameters that influence water quality. PC 1, PC 2 and PC3 account for 72% (eigenvalue 10.74), 14% (eigenvalue 13.94) and 5% (eigenvalue 0.8), respectively, of the data variance. Components with eigenvalues larger than the unit were selected. That is, the first two components together account for 86% of the total variance. The loadings that compose the first two components are presented in the Table 4 and the stations that most influence the results are represented in Fig.  2 A.

figure 2

Multivariate techniques. ( A ) PCA plot with station scores and parameters loadings. ( B ) Hierarchical clustering by Ward linkage with Euclidean distance. The entire dataset can be found as Supplementary Table S2 online.

PC1 was substantially correlated with practically all parameters. Stations number 1 to 4 loaded positively (loadings > 0.7) to PC1 with the parameters TDS, Alkalinity, Ammonia, Total Nitrogen, Phosphate, Total Phosphorus, DBO, COD, E. coli , while stations number 5 to 8 loaded negatively (loadings < − 0.7) with Nitrate, Turbidity, SS, pH and WT. PC2 was most influenced by stations in the urban area, notably station 1, and showed a positive correlation (loadings > 0.5) with OD, COD, BOD and less by SS (loading = 0.33), being more influenced by station 1 in the urban area. On the other hand, it was negatively correlated with E. coli (loading = − 0.66) with a large influence of station 3.

The sampling stations were grouped into three statistically significant clusters with 75% of similarity by agglomerative hierarchical clusterization based on the ward linkage by Euclidean distance (Fig.  2 B): cluster 1 (Stations 2 and 3), cluster 2 (Stations 7 and 8) and cluster 3 (Stations 1, 4, 5 and 6).

Longtime monitoring assessment based on Mann–Kendall rank test and Fourier transform

In a complementary way, in order to evaluate a possible trend on water quality and to detect the seasonal behavior of the basin, we used a time series with 40 years of monitoring. Since dissolved oxygen can be used as a surrogate variable for the general health of aquatic ecosystems 49 , 50 , 51 , it was selected to perform the Mann–Kendall rank test of randomness for the station more upstream and further downstream of the Piabanha River, PB002 and PB011 respectively. The upstream station showed a statistically significant increasing trend (n = 166, S = 1507, Z = 2.10, p < 0.03), whereas the downstream station does not show a statistically significant trend (n = 198, S = 1179, Z = 1.27, p = 0.20). The entire dataset can be found as Supplementary Table S3 and S4 .

To detect the seasonal behavior, we have applied a Fourier transform algorithm to the time series from 1980 to 2019 to the station PB011 (Fig.  3 A, which does not display a tendency behavior and can be considered as representative of the entire basin because it is the most downstream station. The data were organized in quarterly averages for the DO parameter. The two most powerful signals correspond to the frequencies of 0.25 and 0.45, nearly (Fig.  3 B) It corresponds to periods of 12 and 6 months, respectively. Taking into account this seasonality, we confirmed that our 2019 field campaigns are representative of seasonality comprising the final half of the dry season and the initial half of the rainy season.

figure 3

( A ) Temporal distribution of dissolved oxygen from 1980 to 2019 at station PB002 (n = 160). ( B ) Periodogram. The entire dataset can be found in Supplementary Table S5 .

Water quality assessment

The Piabanha River had a better water quality in 2019 than in 2012, according to WQI NSF results (Fig.  1 ). The improvement was substantial over the first 40 km, rated as “bad” in most campaigns in 2012, while rated as medium in most campaigns in 2019 due to sewage collection and treatment system expansion. Since 2012, Petrópolis has built 50 km of sewage collection network and 7 new sewage treatment units 52 . These plants produce secondary level effluents through biological treatment, the plants flow capacity reaches about 800 L s −1 . These stations use different technologies such as: submerged aerated biofilters, anaerobic upflow reactor, moving bed biofilm reactor and upflow anaerobic sludge blanket reactor. Beside this, in some plants are used biosystems 53 . Water quality improved in stretches after 40 km due to self-purification processes and the contribution of clean tributaries. This is in line with findings from other rivers worldwide 31 , 54 , 55 .

Dry seasons, in general, presented better water quality indexes than rainy seasons. Other studies 28 , 56 , 57 have shown similar seasonal behavior, where water quality worsens in the rainy season due to sediments and pollutants input carried by the rain. In addition, most of the sewage network is the same network that collects rainwater. Thus, during rainy events, sewage is no longer treated and is discharged directly into rivers.

Although the WQI NSF had a medium rating in 2019, BOD and Coliforms were substantially above the maximum allowed by Brazilian regulation. In addition, the index is limited to the parameters used in its calculation 58 . This is the case for the ammonium parameter, which presented concentrations up to three times higher than allowed in Brazilian regulation, reminding that only nitrate is used in the WQI NSF . The same occurs with total phosphorus: only phosphate is considered, although it does not have a maximum value established by the Brazilian federal regulation. In what follows, we analyse these parameters in more detail.

Biochemical Oxygen Demand (BOD) is one of the most widely used criteria for water quality assessment. It provides information on the ready biodegradable fraction of the organic load in water 59 . High BOD concentrations reduce oxygen availability, mainly correlated to microbiological activity 60 . Its concentration ranged from 2.00 to 45 mg L −1 (average 7.69 ± 7.52) over the entire data, with its concentrations most of the time substantially above the maximum allowed by Brazilian regulation (5 mg L −1 ). Escherichia coli is naturally present in the intestinal tracts of warm-blooded animals and it is widely used as an indicator of fecal contamination 61 , 62 . Villas-Boas 42 pointed to fecal coliforms as the most relevant water quality parameter in the urban area of Petrópolis, mainly related to pollution caused by untreated domestic sewage.

Phosphorus is an essential nutrient for all forms of life 63 . Its availability can be related to atmospheric deposition 64 , anthropic uses of products such as detergents 65 and due to agricultural activities 66 . Orthophosphates are the most relevant in the aquatic environment as they are the main form of phosphate assimilated by aquatic vegetables 67 . Previous studies 42 , 68 , 69 in the Piabanha Basin found phosphate values in perfect agreement with ours. Alvim 68 points out that the main source of phosphorus for the Piabanha River is the sewage discharge and the higher concentrations are found during the rainy season.

Nitrate is a very common element in surface water since it is the end product of the aerobic decomposition of the organic nitrogenous compound 70 , 71 . Its sources are related to landscape composition, being influenced by both agricultural and urban uses 72 . Villas-Boas 42 found high concentration of nitrate and ammonium in the urban region of Piabanha River in agreement with this study. Alvim 68 reports that domestic sewage discharged into Piabanha River waters account for 43% of the nitrogen load, the atmospheric contribution for 31% and the farming activity for 15%.

The major contributors to water quality and stretches of river with similar water quality

The first two components together account for 86% of the total variance, indicating method high explanatory power of the method. It was far better than other similar studies around the world 29 , 30 , 71 , 73 , 74 , 75 . PC1 predominantly accounts for urban sewage pollution. This is clearly demonstrated by the fact that stations from 1 to 4, located in the urban area of Petrópolis, positively loaded PC1 with organic matter (BOD and COD), TDS and nutrients such as phosphorus and nitrogenous constituents, especially ammonia, indicating recent pollution. Even clearer is the fact that stations from 5 to 8 have negatively loaded with nitrate, showing the nitrogen compounds degradation in the downstream stretches of the urban area. On the other hand, the increase in nitrate concentrations in association with the increase in turbidity in stations outside the urban area may also be associated with land use, especially in agriculture.

PC2 is dominated by the dissolved oxygen parameter and other parameters that indicate the health of the river, as organic load and coliforms. It is explained by water pollution by organic matter and biological activity and reinforces the result of CP1. In the study region, sanitation is still a challenge to be faced by the government, especially in the first urban stretch, after 40 km from the source of the Piabanha River, this region has 26% of untreated sewage 53 .

Cluster analysis was used to group sampling stations into similarity classes indicating the stretches of river with similar water quality. As pointed out by Singh 29 , it implies that only one site in each cluster may serve as good in spatial assessment of the water quality as the whole cluster. So, the number of sampling sites can be reduced; hence, cost without losing any significance of the outcome. On the other hand, this interpretation should be done with caution since trends in different stretches can be very different, making future changes significant. Therefore, great care must be taken to reduce monitoring stations.

It is important to notice that the first cluster (S1, S6 and S4, S5) groups station 1 with station 6, the first one corresponding to the urban area of Petrópolis whose pollution stems from sewage and industrial effluents. Likewise, station 6 is located after the confluence of the Preto-Paquequer River, which crosses Teresópolis, the second largest city in the hydrographic basin, also with the presence of economic and industrial activities. Sand mining is the predominant activity near stations 4 and 5, which together receive the impact of five mining companies. Similarly, station 6, after the Preto River, receives the impact of seven sand mines. In fact, this group brings together economic activities whose impact on water quality is similar. Station 5 could be removed from the network monitoring in order to reduce costs.

The second cluster (S2 and S3) refers to the most urbanized section of the basin. When individually checking the quality parameters between these stations, one can conclude that they differ only by the diluting effect caused by the contribution of the Araras River, on the left bank, and of the Poço do Ferreira River, on the right bank, which receives its waters from the Bonfim River after its source in the Serra dos Órgãos National Park, an important federal conservation unit. Station 3 was introduced precisely to detect this diluting effect, but since the cluster analysis showed that it was not significant it is recommended to remove this station.

The third cluster (S7 and S8) has a very similar behavior: station 8 is just before the Piabanha River mouth and station 7 is located less than 10 km upstream of the mouth. In addition, on this stretch there are only three interferences registered as discharges. Thus, it is recommended to remove station 7, considering the importance of maintaining a station close to the river mouth.

Trend analysis and seasonal variation

Although it still presents systematic violations to Brazilian standards 76 , the water quality, in general, has improved in the Piabanha River over the past 40 years (Fig.  3 A,B). This statement is supported by the Mann–Kendall rank test of randomness, indicating a significant (p = 0.03) tendency to increase the values of the dissolved oxygen parameter at station PB002, located in the urban area of Petrópolis, which is highly impacted by effluent discharges, despite the fact that this region has municipal sewage treatment. PB011 presents high levels of DO, since the beginning of the time series exhibiting an almost monotonic behavior over time, thus it has no tendency. The high DO levels are due to both the river's reoxygenation process and the contribution of clean waters from its tributaries, such as the Fagundes River.

A strong annual and semi-annual seasonality was indicated by the power spectral density, which can be seen in the periodogram (Fig.  3 B) resulting from the Fast Fourier Transform. The results are in accordance with the literature 77 indicating that more than 90% of the total variance of dissolved oxygen is accounted for by the annual periodicity and the next four higher harmonics (semi-annual; tri-annual, etc.). Seasonality follows the rainfall regime with a dry period from April to September, and a wet period from October to March, according to Araújo's 78 study carried out in the Piabanha River basin.

Water quality at point PB002 started to improve in 2000, when the first sewage treatment plant in the city of Petrópolis came into operation. Currently, 95% of the population has access to drinking water, and the coverage of treated urban sewage is 85%. The municipality has 26 sewage treatment units, responsible for the treatment of 56.2 million liters per day. In relation to the other municipalities in the basin, according to the National Sanitation Information System 79 (SNIS), the municipality of Três Rios treats 2.97% of its sewage, while the other municipalities, Teresópolis, Areal, São José do Vale do Rio Preto, Paty do Alferes and Paraíba do Sul did not report their data to SNIS, potentially indicating that they do not perform sewage treatment. In other words, about 50% of the population has no formal access to sewage treatment services.

The diagnosis provided by this research establishes the first step towards the Framing of water resources according to their intended uses, as established by the Brazilian National Water Resources Policy. In addition to the diagnosis which was carried out a georeferenced database was built. There are few cases of Framework in Brazil and none in the studied watershed. This makes this study relevant to Brazilian water resources management. The considerable number of users awaiting regularization from the State Environmental Institute is a limitation to implement the Framework and requires a joint effort of the watershed committee.

Answering our initial question, Piabanha River water quality is medium according to the WQI NSF and certainly is not able to support high levels of biodiversity. Some river stretches have quality compatible with class 4 according to the Brazilian regulation for the coliforms, BOD and TP parameters; hence, they cannot be used for irrigation, human or animal consumption, not even after treatment. On the other hand, the Framework must be carried out according to intended uses. Therefore, we recommend that the Piabanha Committee, in partnership with the State Public Ministry, lead actions to reduce the concentrations of these parameters, mainly in the sanitation sector.

It is recommended that the monitoring program be continued and expanded to stretches where conflicts between water uses occur, in order to implement the Framework to enforce the improvement of water quality. It is also important to point out that this study was financed with public resources from the Piabanha water resources fund and that the present analysis made possible to recommend the exclusion of three of the eight existing stations, thereby enabling the expansion of the monitoring to other tributaries of the Piabanha River under the influence of large population with practically no sanitation, notably the Rio Preto/Paquequer sub-basin.

This work describes a methodological approach that can be useful for other researches in environmental science and management. We have applied an integrated approach using data from different sources combined with data analysis based on WQI, PCA, CA, frequency analysis and trend analysis, which were used in a complementary way to understand a research problem.

Materials and methods

The Piabanha Basin is located in southern Brazil, belonging to the mountainous region of the State of Rio de Janeiro with an area of 2050 km 2 (Fig.  4 ). The Piabanha River source is at 1150 m of altitude and runs down 80 km until it flows into the Paraíba do Sul River at an altitude of 260 m. The upper portion of the basin presents a humid tropical climate. With steep slopes, annual rainfall exceeds 2000 mm. The lower portion of the basin has a sub-humid climate and the average rainfall decreases to 1300 mm. The seasons are well defined throughout the basin and the rainfall regime has symmetry in its distribution between the periods from January to June and from July to December 78 . The territory is home to 535 thousand people in 2018 80 . The two largest cities in the region, Petrópolis and Teresópolis, are located in the headwaters of the basins and give rise to the Piabanha and Preto rivers, respectively. Additionally, because the sewage treatment is limited and the river flows are low, high constituent concentrations are observed (e.g., fecal coliform, nitrate, and BOD), especially in urban areas 42 .

figure 4

Study area, sample stations and interference points (water abstraction or effluxent discharge). This map was generated in the open source software QGIS version 3.14.15 ( https://qgis.org/ ).

Three sets of monitoring data have been used in this researchh (Fig.  4 ). The first and main one was the result of a monitoring program that is being conducted by the Piabanha watershed Committee, in which data from July to December 2019 have been analysed and are described in more details in the next item. The second were from 6 campaigns carried out in 2012 by HIDROECO project 44 also with financial resources from the Piabanha Committee which is used as a baseline for comparison purposes. The third was comprised of two stations of the basic monitoring network of the Rio de Janeiro Environmental Institute, with data from 1980 to the present, except for periods of data gaps.

A georeferenced database was also built containing water management data. Brazilian National Water Agency (ANA) has developed the National Water Resources Users Register (CNARH) for any bulk water user that changes regime, quantity or quality of a water body. It is a federal platform, but it can be managed by each state. Registration is a prerequisite for the other stages of uses regularization.

Monitoring campaigns and analytical procedures

Physical–chemical parameters were measured in situ using a multiparameter probe (YSI model 556) and a portable turbidimeter (HANNA model HI 98703-0), both previously calibrated and later verified. The samples were placed in specific containers for each analysis, for the necessary parameters the samples were preserved with H 2 SO 4 and kept at a temperature below 4 °C. Laboratory analyses (Table 1 ) were performed according to Standard Methods for the Examination of Water and Wastewater (SMWW) 81 . The laboratory has an accreditation certificate issued by the State Environmental Agency (INEA CCL No. IN044710) and also complies to ISO/IEC 17025 (CRL 1035).

Water Quality Index

A Water Quality Index (WQI) is an empirical expression which integrates significant physical, chemical and microbiological parameters of water quality into a single number 82 . It can be a powerful communication tool to simplify a complex set of parameters, whose individual interpretation can be difficult, into a single index representing the general water quality. A water quality index was initially proposed by Horton 26 and further developed by Brown 27 , 83 resulting in the National (USA) Sanitation Foundation Water Quality Index (WQI NSF ).

The original version of the WQI NSF established an additive expression 27 ; on the other hand, field data analysis suggested that the additive WQI lacked sensitivity in adequately reflecting the effect of a single low value parameter on the overall water quality. As a result, a multiplicative form of WQI was proposed 82 , 83 :

q i is the quality class for the n th variable, a number between 0 and 100, obtained from the respective average quality variation curve 82 , depending on the concentration of each nth variable. W i is the relative weight for the n th variable, number between 0 and 1, assigned according to the importance of the variable for overall quality conformation. WQI NSF is the National Sanitation Foundation Water Quality Index, a number between 0 and 100, rated as "excellent" (100 > WQI ≥ 90), "good," (90 > WQI ≥ 70), "medium" (70 > WQI ≥ 50), "bad" (50 > WQI ≥ 25) or "very bad" (25 > WQI ≥ 0).

The WQI NSF and its many adaptations have been widely used 84 , 85 , however, its use is not uniform, replacing parameters without the necessary adaptation of the respective curve of the indicator. In Brazil, since 1975 the WQI NSF has been used by CETESB (Environmental Company of the State of São Paulo). In the following decades, other Brazilian states adopted, with minor adaptations, this index, which today is the most widely used in the country. In the present study, the weights (w i ) have been used according to the methodology established by INEA (Environmental Institute of the State of Rio de Janeiro): DO (0.17); Fecal coliforms (0.16); pH and BOD (0.11); Nitrates, Phosphate and Temperature (0.10); Turbidity (0.08) and TDS (0.07), rather than total solids.

The replacement of the total solids for dissolved solids parameter may cause an average variation of 0.2% in the final result of WQI NSF , based on our estimates (n = 48, data 2019). In relation to microbiology, E. coli have been used instead of fecal coliforms, applying a correction factor 86 of 1.25 on the result of E. coli .

Principal component analysis and cluster analysis

Principal component analysis (PCA), as defined by Hotelling 87 , is a multivariate technique of covariance modeling that reduces the dimensionality of an originally correlated dataset, with the lowest possible information loss. A new set of variables containing new orthogonal, uncorrelated variables, is formed from a dataset of correlated variables, which are weighed linear combinations of the original variables 30 .

PCA technique extracts the eigenvalues and eigenvectors from the covariance matrix of original variables. The PCs are obtained by multiplying the original correlated variables with the eigenvector, which is a list of coefficients, frequently called “loadings” 29 , 30 , 88 , 89 . A widely accepted and simple qualitative rule proposes that loadings greater than 0.30 or less than − 0.30 are significant; loadings greater than 0.40 or less than − 0.40 are more important, whereas loadings greater than 0.50 or less than − 0.50 are very significant 90 . The suitability of data for PCA was evaluated by Kaiser–Meyer–Olkin 91 , 92 (KMO) measuring of sampling adequacy and Bartlett tests of sphericity 93 . The Shapiro test was evaluated to verify the data normality (α = 0.01).

Cluster analysis reveals the latent behavior of a dataset to categorize the objects into groups or clusters on the basis of similarities 30 , 88 , 89 . Hierarchical agglomerative cluster analysis (CA) classifies objects by first putting each object in a separate cluster, and then joins the clusters together stepwise until a single cluster remains 29 .

Timeseries analysis and trend detection

Mann–Kendall trend test is a nonparametric test used to identify a trend in a series, first proposed by Mann 94 and further improved by Kendall 95 and Hirsch 96 . The null hypothesis (H 0 ) for these tests is that there is no trend in the series. The tests are based on the calculation of Kendall's tau measure of association between two samples, which is itself based on the ranks with the samples. The variables are ranked in pairs, and the difference of each variable to its antecessor is calculated. The total number of pairs that present negative differences is subtracted from the number of pairs with positive differences (S). A positive value of S indicates an upward trend, and a negative value of S a downward trend. For n > 10, a normal approximation is used to calculate Z statistic which is used to calculate p-value 96 .

Fourier decomposition is a technique which allows the separation of frequency components from a data series with seasonal behavior from a complex water quality dataset 97 . Spectral analysis performed using a Fast Fourier Transform (FFT) algorithm is widely used in environmental studies, because it reveals the dominant influences and their scales 50 . Power spectral density (PSD) obtained from FFT and represented by periodograms is a recommended procedure to detect seasonality 98 , 99 .

Brazilian legal regulation

Brazilian fresh waters are divided into four classes, depending on the intended use 76 . The Special Class is intended mainly for the preservation of the natural balance of aquatic communities in fully protected conservation areas. Class 1 is designed for human consumption supply, after simplified treatment, for the protection of aquatic communities and for primary contact recreation. Class 2 requires conventional treatment for human consumption. Class 3 requires conventional or advanced treatment for human consumption and can be used to feed animals and irrigate some crops. Class 4 is intended only for navigation and landscape harmony. It is important to note that the Framework refers to the required water quality target according to water uses. The river basin committees are responsible for implementing the Framework, in accordance with the Brazilian National Water Resources Policy 33 . As long as the Framework is not established by the basin committee, fresh waters will be considered class 2 (Art. 42 CONAMA 357/2005) 76 .

Data availability

All data generated or analysed during this study are included in this published article and its Supplementary Information files.

Martin, V. M. & Joel, A. T. History of the Urban Environment (University of Pittsburgh Press, Pittsburgh, 2012).

Google Scholar  

Wang, J., Liu, X. D. & Lu, J. Urban river pollution control and remediation. Procedia Environ. Sci. 13 , 1856–1862 (2012).

Article   CAS   Google Scholar  

Zhang, X., Wu, Y. & Gu, B. Urban rivers as hotspots of regional nitrogen pollution. Environ. Pollut. 205 , 139–144 (2015).

Article   CAS   PubMed   Google Scholar  

Harding, L. W. et al. Long-term trends, current status, and transitions of water quality in Chesapeake Bay. Sci. Rep. 9 , 1–19 (2019).

John, V., Jain, P., Rahate, M. & Labhasetwar, P. Assessment of deterioration in water quality from source to household storage in semi-urban settings of developing countries. Environ. Monit. Assess. 186 , 725–734 (2014).

Mishra, B. K. et al. Assessment of Bagmati river pollution in Kathmandu Valley: scenario-based modeling and analysis for sustainable urban development. Sustain. Water Qual. Ecol. https://doi.org/10.1016/j.swaqe.2017.06.001 (2017).

Article   Google Scholar  

Xu, Z. et al. Urban river pollution control in developing countries. Nat. Sustain. 2 , 158–160 (2019).

UN-Water. Sustainable Development Goal 6 Synthesis Report on Water and Sanitation 2018. Un (2018). https://doi.org/10.1126/science.278.5339.827 .

UNEP. A Snapshot of the World’s Water Quality: Towards a global assessment . (United Nations Environment Programme, 2016).

Wada, Y. et al. Modeling global water use for the 21st century: the Water Futures and Solutions (WFaS) initiative and its approaches. Geosci. Model Dev. https://doi.org/10.5194/gmd-9-175-2016 (2016).

WWAP. The United Nations World Water Development Report 2019: Leaving No One Behind (2019).

Fan, M. & Shibata, H. Simulation of watershed hydrology and stream water quality under land use and climate change scenarios in Teshio River watershed, northern Japan. Ecol. Indic. https://doi.org/10.1016/j.ecolind.2014.11.003 (2015).

Putro, B., Kjeldsen, T. R., Hutchins, M. G. & Miller, J. An empirical investigation of climate and land-use effects on water quantity and quality in two urbanising catchments in the southern United Kingdom. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2015.12.132 (2016).

Article   PubMed   Google Scholar  

Li, B., Rodell, M., Sheffield, J., Wood, E. & Sutanudjaja, E. Long-term, non-anthropogenic groundwater storage changes simulated by three global-scale hydrological models. Sci. Rep. 9 , 10746 (2019).

Article   ADS   PubMed   PubMed Central   CAS   Google Scholar  

Jaeger, W. K. et al. Scope and limitations of drought management within complex human–natural systems. Nat. Sustain. https://doi.org/10.1038/s41893-019-0326-y (2019).

Pastor, A. V. et al. The global nexus of food–trade–water sustaining environmental flows by 2050. Nat. Sustain. https://doi.org/10.1038/s41893-019-0287-1 (2019).

Melo, D. C. D. et al. The big picture of field hydrology studies in Brazil. Hydrol. Sci. J. 65 , 1262–1280 (2020).

Dixon, W. & Chiswell, B. Review of aquatic monitoring program design. Water Res. 30 , 1935–1948 (1996).

Wang, Y., Xiang, C., Zhao, P., Mao, G. & Du, H. A bibliometric analysis for the research on river water quality assessment and simulation during 2000–2014. Scientometrics 108 , 1333–1346 (2016).

Ji, X., Dahlgren, R. A. & Zhang, M. Comparison of seven water quality assessment methods for the characterization and management of highly impaired river systems. Environ. Monit. Assess. 188 , 15 (2016).

Article   PubMed   CAS   Google Scholar  

Deng, W. & Wang, G. A novel water quality data analysis framework based on time-series data mining. J. Environ. Manag. 196 , 365–375 (2017).

Singh, S. et al. Development of indices for surface and ground water quality assessment and characterization for Indian conditions. Environ. Monit. Assess. 191 , 182 (2019).

Mladenović-Ranisavljević, I. I., Takić, L. & Nikolić, D. Water quality assessment based on combined multi-criteria decision-making method with index method. Water Resour. Manag. 32 , 2261–2276 (2018).

Chen, S. K., Jang, C. S. & Chou, C. Y. Assessment of spatiotemporal variations in river water quality for sustainable environmental and recreational management in the highly urbanized Danshui River basin. Environ. Monit. Assess. 191 , 100 (2019).

Rakotondrabe, F. et al. Water quality assessment in the Bétaré-Oya gold mining area (East-Cameroon): multivariate statistical analysis approach. Sci. Total Environ. 610–611 , 831–844 (2018).

Article   ADS   PubMed   CAS   Google Scholar  

Horton, R. K. An Index Number System for Rating Water Quality. J. Water Pollut. Control Fed. (1965).

Brown, R. M., McClelland, N. I., Deininger, R. A. & Tozer, R. G. A water quality index—do we dare?. Water Sew. Work 117 , 339–343 (1970).

Wu, Z. et al. Water quality assessment based on the water quality index method in Lake Poyang: the largest freshwater lake in China. Sci. Rep. 7 , 1–10 (2017).

Article   ADS   CAS   Google Scholar  

Singh, K. P., Malik, A., Mohan, D. & Sinha, S. Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)—a case study. Water Res. 38 , 3980–3992 (2004).

Dutta, S., Dwivedi, A. & Suresh Kumar, M. Use of water quality index and multivariate statistical techniques for the assessment of spatial variations in water quality of a small river. Environ. Monit. Assess. 190 , 718 (2018).

Malsy, M., Flörke, M. & Borchardt, D. What drives the water quality changes in the Selenga Basin: climate change or socio-economic development?. Reg. Environ. Change 17 , 1977–1989 (2017).

Pacheco, F. S. et al. Water quality longitudinal profile of the Paraíba do Sul River, Brazil during an extreme drought event. Limnol. Oceanogr. 62 , S131–S146 (2017).

Brazilian National Congress. Brazilian National Water Resources Policy. Federal Law n. 9433. (1997).

ANA. Brazilian Water Resources Report—2017 (National Water Agency (Brazil), 2018).

Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B. & Kent, J. Biodiversity hotspots for conservation priorities. Nature 403 , 853–858 (2000).

Article   ADS   CAS   PubMed   Google Scholar  

Russo, G. Biodiversity: biodiversity’s bright spot. Nature 462 , 266–269 (2009).

Ribeiro, M. C., Metzger, J. P., Martensen, A. C., Ponzoni, F. J. & Hirota, M. M. The Brazilian Atlantic Forest: How much is left, and how is the remaining forest distributed? Implications for conservation. Biol. Conserv. https://doi.org/10.1016/j.biocon.2009.02.021 (2009).

Tabarelli, M., Aguiar, A. V., Ribeiro, M. C., Metzger, J. P. & Peres, C. A. Prospects for biodiversity conservation in the Atlantic Forest: lessons from aging human-modified landscapes. Biol. Conserv. 143 , 2328–2340 (2010).

Bogoni, J. A., Pires, J. S. R., Graipel, M. E., Peroni, N. & Peres, C. A. Wish you were here: how defaunated is the Atlantic Forest biome of its medium- to large-bodied mammal fauna?. PLoS ONE 13 , e0204515 (2018).

Article   PubMed   PubMed Central   CAS   Google Scholar  

Rezende, C. L. et al. From hotspot to hopespot: an opportunity for the Brazilian Atlantic Forest. Perspect. Ecol. Conserv. 16 , 208–214 (2018).

CEIVAP & PROFILL. Plano de Bacia: Consolidação do diagnóstico . (2018).

Villas-Boas, M. D., Olivera, F. & de Azevedo, J. P. S. Assessment of the water quality monitoring network of the Piabanha River experimental watersheds in Rio de Janeiro, Brazil, using autoassociative neural networks. Environ. Monit. Assess. https://doi.org/10.1007/s10661-017-6134-9 (2017).

Morais, A., Villas-Boas, M., Bastos, A., Monteiro, A. & Araújo, L. Estudos para um diagnóstico quali-quantitativo em bacias experimentais – Estudo de Caso: Bacia do rio Piabanha. In II Seminário de Recursos Hídricos da Bacia Hidrográfica do Paraíba do Sul: Recuperação de Áreas Degradadas, Serviços Ambientais e Sustentabilidade 173–180 (2009). https://doi.org/10.4136/serhidro.23 .

Azevedo, J. P. S. de. Relatório final do projeto HIDROECO/Piabanha: Metodologia para Determinação de Vazões Ambientais na Região Serrana do RJ Integrando Aspectos Hidrometeorológicos, Ecológicos e Socioeconômico. Volume 1: Informações Quali-quantitativas (2017).

de Mello, F. V. et al. Current state of contamination by persistent organic pollutants and trace elements on Piabanha River Basin—Rio de Janeiro, Brazil. Orbital Electron. J. Chem. 10 , 327–336 (2018).

Chiappori, D., Hora, M. & Azevedo, J. Interface between hydropower generation and other water uses in the Piabanha River Basin in Brazil. Br. J. Appl. Sci. Technol. https://doi.org/10.9734/bjast/2016/23935 (2016).

da Silva, P. V. R. M., Pecly, J. O. G. & de Azevedo, J. P. S. Uso de traçadores fluorescentes para determinar características de transporte e dispersão no Rio Piabanha (RJ) para a modelagem quali-quantitativa pelo HEC-RAS. Eng. Sanit. Ambient. https://doi.org/10.1590/s1413-41522017150187 (2017).

de Costa, D. A., dos Assumpção, R. S. F. V., de Azevedo, J. P. S. & dos Santos, M. A. On water resources management instruments—Framing—as a tool for river rehabilitation. Saúde em Debate 43 , 35–50 (2019).

Abdul-Aziz, O. I., Wilson, B. N. & Gulliver, J. S. An extended stochastic harmonic analysis algorithm: application for dissolved oxygen. Water Resour. Res. 43 , W08417 (2007).

Rajwa-Kuligiewicz, A., Bialik, R. J. & Rowiński, P. M. Dissolved oxygen and water temperature dynamics in lowland rivers over various timescales. J. Hydrol. Hydromechanics 63 , 353–363 (2015).

United States Environmental Protection Agency. Quality criteria for water . https://nepis.epa.gov/Exe/ZyPDF.cgi/00001MGA.PDF?Dockey=00001MGA.PDF (1986).

Imperador, Á. do. Our history. https://www.grupoaguasdobrasil.com.br/aguas-imperador/en/ (2020).

ANA. Atlas Esgotos—Despoluição de bacias hidrográficas . (Brazilian National Water Agency, 2017).

Karthe, D., Lin, P.-Y. & Westphal, K. Instream coliform gradients in the Holtemme, a small headwater stream in the Elbe River Basin, Northern Germany. Front. Earth Sci. 11 , 544–553 (2017).

Article   ADS   Google Scholar  

von Sperling, M. & von Sperling, E. Challenges for bathing in rivers in terms of compliance with coliform standards. Case study in a large urbanized basin (das Velhas River, Brazil). Water Sci. Technol. 67 , 2534–2542 (2013).

Bae, H. Changes of river’s water quality responded to rainfall events. Environ. Ecol. Res. 1 , 21–25 (2013).

Yu, S., Xu, Z., Wu, W. & Zuo, D. Effect of land use types on stream water quality under seasonal variation and topographic characteristics in the Wei River basin, China. Ecol. Indic. 60 , 202–212 (2016).

Lumb, A., Sharma, T. C. & Bibeault, J.-F. A review of genesis and evolution of Water Quality Index (WQI) and some future directions. Water Qual. Exposure Health 3 , 11–24 (2011).

Jouanneau, S. et al. Methods for assessing biochemical oxygen demand (BOD): a review. Water Res. 49 , 62–82 (2014).

Vigiak, O. et al. Predicting biochemical oxygen demand in European freshwater bodies. Sci. Total Environ. 666 , 1089–1105 (2019).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Ishii, S. & Sadowsky, M. J. Escherichia coli in the environment: implications for water quality and human health. Microbes Environ. 23 , 101–108 (2008).

Odonkor, S. T. & Ampofo, J. K. Escherichia coli as an indicator of bacteriological quality of water: an overview. Microbiol. Res. (Pavia) 4 , 2 (2013).

Schlesinger, W. H. & Bernhardt, E. S. Biogeochemistry. Biogeochemistry: an analysis of global change 3rd edn. (Elsevier, Amsterdam , 2013). https://doi.org/10.1016/C2010-0-66291-2 .

Book   Google Scholar  

Tipping, E. et al. Atmospheric deposition of phosphorus to land and freshwater. Environ. Sci. Process. Impacts https://doi.org/10.1039/c3em00641g (2014).

Withers, P. J. A. & Jarvie, H. P. Delivery and cycling of phosphorus in rivers: a review. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2008.08.002 (2008).

Sharpley, A. Agricultural phosphorus, water quality, and poultry production: are they compatible?. Poult. Sci. https://doi.org/10.1093/ps/78.5.660 (1999).

House, W. A. & Denison, F. H. Exchange of inorganic phosphate between river waters and bed-sediments. Environ. Sci. Technol. https://doi.org/10.1021/es020039z (2002).

Alvim, B. R. Dinâmica do nitrogênio e fósforo em águas fluviais de uma bacia hidrográfica com diferentes usos do solo no Sudeste do Brasil (Universidade Federal Fluminense, 2016).

Molinari, B. S. Modelagem espacial da qualidade de água na bacia do rio Piabanha/RJ (Universidade Federal do Rio de Janeiro, 2015).

Jaji, M. O., Bamgbose, O., Odukoya, O. O. & Arowolo, T. A. Water quality assessment of Ogun river, South West Nigeria. Environ. Monit. Assess. 133 , 473–482 (2007).

Mitra, S. et al. Water quality assessment of the ecologically stressed Hooghly River Estuary, India: a multivariate approach. Mar. Pollut. Bull. 126 , 592–599 (2018).

Guo, H. Y., Wang, X. R. & Zhu, J. G. Quantification and index of non-point source pollution in Taihu Lake region with GIS. Environ. Geochem. Health https://doi.org/10.1023/B:EGAH.0000039577.67508.76 (2004).

Khuhawar, M. Y., Zaman Brohi, R. O., Jahangir, T. M. & Lanjwani, M. F. Water quality assessment of Ramser site, Indus Delta, Sindh, Pakistan. Environ. Monit. Assess. 190 , 492 (2018).

Alves, R. I. S. et al. Water quality assessment of the Pardo River Basin, Brazil: a multivariate approach using limnological parameters, metal concentrations and indicator bacteria. Arch. Environ. Contam. Toxicol. 75 , 199–212 (2018).

Liang, B. et al. Distribution, sources, and water quality assessment of dissolved heavy metals in the Jiulongjiang River water, Southeast China. Int. J. Environ. Res. Public Health 15 , 2752 (2018).

Article   CAS   PubMed Central   Google Scholar  

Brazil. Brazilian National Environment Council ( CONAMA ) Resolution n. 357. Provides the classification of water bodies and environmental guidelines for their framework, as well as establishes the conditions and standards for effluents discharge 1–27 (2005).

Thomann, R. V. Time-series analyses of water-quality data. J. Sanit. Eng. Div. 93 , 1–24 (1967).

Araújo, L. M. N. de. Identification of precipitation and soil moisture hydrological patterns at Piabanha river basin. Ph.D. thesis. (Federal University of Rio de Janiero, 2016).

Brazil. Brazilian National Sanitation Information System (SNIS). http://www.snis.gov.br/ (2020).

CEIVAP/PROFILL. Integrated plan for water resources in the watershed of the Paraíba do Sul river . http://sigaceivap.org.br:8080/publicacoesArquivos/ceivap/arq_pubMidia_AGVP_PS_PIRH_PP-06_REV03_FINAL.pdf (2020).

American Public Health Association. Standard Method for Examination of Water and Wastewater . (American Public Health Association, 2012).

McClelland, N. I. Water Quality Index Application in the Kansas River Basin (1974).

Brown, R.M., McClelland, N.I., Deininger, R.A., Landwehr, J. M. Validating the WQI . National meeting of American Society of Civil Engineers on water resources engineering (1973).

Noori, R., Berndtsson, R., Hosseinzadeh, M., Adamowski, J. F. & Abyaneh, M. R. A critical review on the application of the National Sanitation Foundation Water Quality Index. Environ. Pollut. 244 , 575–587 (2019).

Kachroud, M., Trolard, F., Kefi, M., Jebari, S. & Bourrié, G. Water quality indices: challenges and application limits in the literature. Water (Switzerland) 11 , 1–26 (2019).

CETESB, Companhia Ambiental do Estado de São Paulo. Qualidade das Águas Interiores no Estado de São Paulo - Apêndice D—Índices de Qualidade das Águas . https://cetesb.sp.gov.br/aguas-interiores/publicacoes-e-relatorios/ (2019).

Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24 , 417–441 (1933).

Article   MATH   Google Scholar  

Helena, B. et al. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Res. https://doi.org/10.1016/S0043-1354(99)00225-0 (2000).

Vega, M., Pardo, R., Barrado, E. & Debán, L. Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Res. 32 , 3581–3592 (1998).

Sergeant, C. J., Starkey, E. N., Bartz, K. K., Wilson, M. H. & Mueter, F. J. A practitioner’s guide for exploring water quality patterns using principal components analysis and procrustes. Environ. Monit. Assess. 188 , 249 (2016).

Cerny, B. A. & Kaiser, H. F. A study of a measure of sampling adequacy for factor-analytic correlation matrices. Multivariate Behav. Res. https://doi.org/10.1207/s15327906mbr1201_3 (1977).

Kaiser, H. F. An index of factorial simplicity. Psychometrika https://doi.org/10.1007/BF02291575 (1974).

Arsham, H. & Lovric, M. Bartlett’s Test. In International encyclopedia of statistical science (ed. Lovric, M.) 87–88 (Springer, Berlin, 2011). https://doi.org/10.1007/978-3-642-04898-2_132 .

Chapter   MATH   Google Scholar  

Mann, H. B. Nonparametric tests against trend. Econometrica 13 , 245 (1945).

Article   MathSciNet   MATH   Google Scholar  

Kendall, M. G. Rank correlation methods (Oxford University Press, Oxford, 1975).

Hirsch, R. M., Slack, J. R. & Smith, R. A. Techniques of trend analysis for monthly water quality data. Water Resour. Res. 18 , 107–121 (1982).

Whitfield, P. H. Identification and characterization of transient water quality events by Fourier analysis. Environ. Int. 21 , 571–575 (1995).

Harris, J., Loftis, J. C. & Montgomery, R. H. Statistical methods for characterizing ground-water quality. Ground Water 25 , 185–193 (1987).

Hipel, K. W. & McLeod, A. I. Time series modelling of water resources and environmental systems. Time Ser. Model. Water Resour. Environ. Syst. https://doi.org/10.1016/0022-1694(95)90010-1 (1994).

Download references


We thank the Piabanha Committee for financially support our research. We also thank Juliana Pereira Dias for helping with statistical analysis, Renata Demori Costa and Jamie Sweeney for the english review.

Author information

Authors and affiliations.

Federal University of Rio de Janeiro (UFRJ), Alberto Luiz Coimbra Institute for Graduate Studies and Engineering Research (COPPE), Centro Tecnológico, Cidade Universitária, Rio de Janeiro, RJ, Brazil

David de Andrade Costa, José Paulo Soares de Azevedo & Marco Aurélio dos Santos

Federal Fluminense Institute, São João da Barra Advanced Campus, BR 356, KM 181, São João da Barra, RJ, Brazil

David de Andrade Costa

Oswaldo Cruz Foundation (Fiocruz), National School of Public Health Sergio Arouca (ENSP), Rua Leopoldo Bulhões, 1.480, Manguinhos, Rio de Janeiro, RJ, Brazil

Rafaela dos Santos Facchetti Vinhaes Assumpção

You can also search for this author in PubMed   Google Scholar


D.A.C. compiled the manuscript, performed the analysis, and generated the figures and tables in the main text. J.P.S.A. contributed to the discussions and carefully reviewed the manuscript. M.A.S., R.S.F.V.A. and J.P.S.A. made substantial contributions to the conception and design of the research. All the authors reviewed the manuscript.

Corresponding authors

Correspondence to David de Andrade Costa or José Paulo Soares de Azevedo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary table s1., supplementary table s2., supplementary table s3., supplementary table s4., supplementary table s5., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

de Andrade Costa, D., Soares de Azevedo, J.P., dos Santos, M.A. et al. Water quality assessment based on multivariate statistics and water quality index of a strategic river in the Brazilian Atlantic Forest. Sci Rep 10 , 22038 (2020). https://doi.org/10.1038/s41598-020-78563-0

Download citation

Received : 02 May 2020

Accepted : 13 November 2020

Published : 16 December 2020

DOI : https://doi.org/10.1038/s41598-020-78563-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Trend detection and depletion effects evidence in time series of groundwater levels in the southern sector of the left bank of the tagus-sado basin (portugal, iberian peninsula).

  • Mariana Ferreira Branco
  • Sofia Verónica Barbosa
  • João Xavier Matos

Sustainable Water Resources Management (2024)

Statistical analysis of seasonal variation in the characteristics of soil like material and refuse derived fuel recovered from landfill mining

  • Gurusamy Saravanan
  • Srikrishnaperumal Thangam Ramesh

Stochastic Environmental Research and Risk Assessment (2024)

Monitoring network optimization and impact of fish farming upon water quality in the Três Marias Hydroelectric Reservoir, Brazil

  • Maria Clara V M Starling
  • Cristiano Christofaro
  • Camila C Amorim

Environmental Science and Pollution Research (2024)

Application of unsupervised clustering model based on graph embedding in water environment

Scientific Reports (2023)

Assessment of seasonal water quality and land use land cover change in Subarnarekha watershed of Ranchi stretch in Jharkhand

  • Kiran Prakash Kadave
  • Neeta Kumari

Environmental Science and Pollution Research (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

thesis on multivariate statistics

Multivariate Analysis of Variance

  • First Online: 01 January 2012

Cite this chapter

thesis on multivariate statistics

  • David Aaron Maroof 2  

1798 Accesses

Multivariate analysis of variance (MANOVA) is an omnibus procedure that allows for the contemporaneous analysis of more than one dependent variable. Dependent variables are the outcome variables, or criteria, of a research design. Performance on neuropsychological tests—memory scores, reaction time, processing speed, the number of words generated on tests of word fluency—can all serve as dependent variables. Interestingly, these dependent variables are often reversed and used as independent variables, or predictors, when interpreting the results of a significant MANOVA. And, scores on neuropsychological tests occasionally serve as independent variables in their own right. For example, one can empirically (e.g., median split) dichotomize performance on any one measure, say processing speed, and then compare persons who are “slow” and “fast” (the independent, or grouping variable) on a number of other dependent variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Andreasen, N. C. (1983). Scale for the assessment of negative symptoms . Iowa City: University of Iowa Press.

Google Scholar  

Andreasen, N. C. (1984). Scale for the assessment of positive symptoms . Iowa City: University of Iowa Press.

Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36 , 317–346.

PubMed   Google Scholar  

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6 , 65–70.

Randolph, C. (1998). Repeatable battery for the assessment of neuropsychological status: Manual . San Antonio, TX: The Psychological Corporation.

Stevens, J. P. (2002). Applied multivariate statistics for the social sciences (4th ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

Download references

Author information

Authors and affiliations.

The Esplanade 8767, Orlando, FL, USA

David Aaron Maroof

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Maroof, D.A. (2012). Multivariate Analysis of Variance. In: Statistical Methods in Neuropsychology. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3417-7_9

Download citation

DOI : https://doi.org/10.1007/978-1-4614-3417-7_9

Published : 07 March 2012

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4614-3416-0

Online ISBN : 978-1-4614-3417-7

eBook Packages : Behavioral Science Behavioral Science and Psychology (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Utrecht University Logo

  • Help & FAQ

Practical applications of multivariate statistics in exploration geochemistry

  • IVAU: Instituut voor Aardwetenschappen Utrecht
  • Bio-, hydro-, and environmental geochemistry
  • Geochemistry

Research output : Thesis › Doctoral thesis 1 (Research UU / Graduation UU)

  • statistiek voor de geochemie

Access to Document

  • Vriend-SP-70-1990 Final published version, 5.66 MB


  • Multivariate Statistics Keyphrases 100%
  • Exploration Geochemistry Keyphrases 100%
  • Ore Deposits Keyphrases 50%
  • Factor Analysis Keyphrases 50%
  • Univariate Statistics Keyphrases 50%
  • Cluster Analysis Keyphrases 50%
  • Discriminant Function Analysis Keyphrases 50%
  • Mapping Analysis Keyphrases 50%

T1 - Practical applications of multivariate statistics in exploration geochemistry

AU - Vriend, S.P.

N1 - Geologica Ultraiectina ; 70

PY - 1990/9/26

Y1 - 1990/9/26

N2 - The search for new economic ore-deposits becomes increasingly difficult. A sophisticated approach is required to locate new ones. In exploration geochemistry the use of uni- and multivariate statistics is often advocated. In this series of studies it is shown how techniques such as factor analysis, discriminant function analysis, non linear mapping and cluster analysis can be of use to the exploration geochemist. Applications are presented to geochemical rock, sediment and water surveys.

AB - The search for new economic ore-deposits becomes increasingly difficult. A sophisticated approach is required to locate new ones. In exploration geochemistry the use of uni- and multivariate statistics is often advocated. In this series of studies it is shown how techniques such as factor analysis, discriminant function analysis, non linear mapping and cluster analysis can be of use to the exploration geochemist. Applications are presented to geochemical rock, sediment and water surveys.

KW - statistiek voor de geochemie

M3 - Doctoral thesis 1 (Research UU / Graduation UU)

SN - 90-71577-23-6

PB - Faculteit Aardwetenschappen

CY - Utrecht


  1. An Introduction To Multivariate Statistical Analysis PDF

    thesis on multivariate statistics

  2. Understanding multivariate multiple regression and its application

    thesis on multivariate statistics

  3. An Introduction to Multivariate Statisti

    thesis on multivariate statistics

  4. Multivariate analysis

    thesis on multivariate statistics

  5. (PDF) The Multivariate Order Statistics for Exponential and Weibull

    thesis on multivariate statistics

  6. An Introduction to Multivariate Statistical Analysis by T.W. Anderson

    thesis on multivariate statistics


  1. SPSS in Nepali

  2. SPSS in Nepali

  3. SPSS in Nepali

  4. SPSS in Nepali

  5. 19 Snapshot of Multivariate Probability and Statistics: multivariate probability

  6. SPSS in Nepali


  1. PDF How to interpret and report the results from multivariable analyses

    fused with multivariate analyses, which are used to assess the relationships of several predictors with two or more dependent vari - ables or outcomes at the same time. In this article we will not review multivariate analyses. However, medical writers should be aware that the terms multivariate and multivariable are often used inter - changeably.

  2. Topics In Multivariate Statistics

    This thesis includes the study of three independent research problems in multivariate statistics. ^ The first part of the thesis studies additive principal components (APCs for short), a nonlinear ...

  3. (PDF) Multivariate Statistical Analysis

    Multivariate analysis of commonly used methods include three categories: 1. Multivariate analysis of variance, multiple regression analysis and analysis of covariance, known as the. linear model ...

  4. PDF Multivariate Data Analysis with Applications to Cancer

    Multivariate data is common in a wide range of settings. As data structures be-come increasingly complex, additional statistical tools are required to perform proper analyses. In this dissertation we develop and evaluate methods for the analysis of multivariate data generated from cancer trials. In the first chapter we

  5. PDF The Application of Multivariate Statistical Analysis and Batch ...


  6. PDF Statistical Models and Analysis of Univariate and Multivariate

    Part of the Other Statistics and Probability Commons, Statistical Models Commons, and the Survival Analysis Commons Recommended Citation Palayangoda, Lochana, "Statistical Models and Analysis of Univariate and Multivariate Degradation Data" (2020). Statistical Science Theses and Dissertations. 15.

  7. Multivariate Analysis: Overview

    Abstract. Multivariate analysis is appropriate whenever more than one variable is measured on each sample individual, and overall conclusions about the whole system are sought. Many different multivariate techniques now exist for addressing a variety of objectives. This brief review outlines, in broad terms, some of the more common objectives ...

  8. Advances in multivariate statistics and its applications

    Thesis advisor Segal, Mark, degree committee member. Thesis advisor Tibshirani, Robert, degree committee member. Thesis advisor ... (CCA) is one of the core approaches in multivariate statistics. It is a technique for measuring the association between two multivariate sets of variables, which has a wide variety of applications. ...

  9. SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics

    Preface. The goals of this book are to present a very concise, easy‐to‐use introductory primer of a host of computational tools useful for making sense out of data, whether that data come from the social, behavioral, or natural sciences, and to get you started doing data analysis fast.

  10. PDF SEVENTH EDITION Using Multivariate Statistics

    Using Multivariate Statistics Barbara G. Tabachnick California State University, Northridge Linda S. Fidell California State University, Northridge 330 Hudson Street, NY NY 10013 A01_TABA0541_07_ALC_FM.indd 1 5/17/18 8:59 PM. Portfolio Manager: Tanimaa Mehra Content Producer: Kani Kapoor

  11. Topics in Multivariate Statistics with Dependent Data

    Dissertations. Topics in Multivariate Statistics with Dependent Data. This dissertation comprises four chapters. The first is an introduction to the topics of the dissertation and the remaining chapters contain the main results. Chapter 2 gives new results for consistency of maximum likelihood estimators with a focus on multivariate mixed models.

  12. "Multivariate Statistics" by Joshua A. Dailey

    This thesis provides an introduction to several topics in multivariate statistics. The topics investigated include the multivariate normal distribution, discriminant analysis, and the T^2-test. This thesis yields a reasonable blend of theory and practice. There is sufficient theory introduced to make the topics mathematically interesting as well as a blend of real-world examples in order ...

  13. Water quality assessment based on multivariate statistics and water

    Multivariate statistics is another widely used approach 29,30, mainly with Principal Components Analysis ... Ph.D. thesis. (Federal University of Rio de Janiero, 2016). Brazil. Brazilian National ...


    By delving into multivariate statistics and exploring these various techniques, this thesis aims to provide readers with an introductory understanding of this impor-tant field of statistical analysis. In chapter 2 we will build the theory we need to handle multivariate observations, in chapter 3 we will be concerned with applying

  15. PDF Teaching How to Write about Multivariate Analysis

    There are several steps to teaching how to write about multivariate analysis in graduate coursework or for dissertation writers. First, assign readings that cover key principles about statistical research writing, such as Miller (2005), Treiman (2009), or other books or articles on writing or professional research practice.

  16. ScholarWorks@UMass Amherst

    ScholarWorks@UMass Amherst

  17. Multivariate Analysis: Overview

    multivariate analysis. Multivariate methods have had a slightly curious genesis and development. The earliest work, dating from the end of the nineteenth century, was rooted in practical problems arising from social and ed-ucational research (See Educational Statistics, Educational Psychology: Measuring Change Over Time),

  18. Multivariate Analysis, Bayesian: Overview with Examples

    This article discusses Bayesian multivariate analysis. It focuses on higher-dimensional statistical inference based upon the posterior distribution, so inference involves multidimensional prior, posterior, and predictive probability distributions. The article describes credibility intervals and regions, and hypothesis testing.

  19. Multivariate Methods for Agricultural Research

    Multivariate statistical methods encompass the simultaneous analysis of all variables measured on each experimental or sampling unit. Many agronomic research systems studied are, by their very nature, multivariate; however, most analyses reported are univariate (i.e., analysis of one response at a time).

  20. Multivariate Analysis of Variance

    Introduction. Multivariate analysis of variance (MANOVA) is an omnibus procedure that allows for the contemporaneous analysis of more than one dependent variable. Dependent variables are the outcome variables, or criteria, of a research design. Performance on neuropsychological tests—memory scores, reaction time, processing speed, the number ...

  21. Practical applications of multivariate statistics in exploration

    A sophisticated approach is required to locate new ones. In exploration geochemistry the use of uni- and multivariate statistics is often advocated. In this series of studies it is shown how techniques such as factor analysis, discriminant function analysis, non linear mapping and cluster analysis can be of use to the exploration geochemist.

  22. PDF Order Statistics and Multivariate discrete phase-type distributions

    tribution work is mainly for multivariate phase-type distributions in the discrete time and for the distribution of concomitants in the continuous time. The focus of this thesis is to present theoretical results that helps to unify the theory of this area. The thesis consists on three research works: representations of order statistics from

  23. Multivariate analysis on performance in statistics, self-efficacy and

    The study adopted the questionnaires on self-efficacy beliefs and attitude towards Statistics while it utilized a researcher-made questionnaire for performance in Statistics. Multivariate Analysis ...