Abstract:
In order to investigate the distribution characteristics of pollutants at contaminated sites, it is necessary to collect soil and groundwater samples by drilling and test them by the standard procedure. In the preliminary and detailed investigation, a large amount of data of soil and groundwater pollution will be obtained. These data are often characterized by large sample size, multiple monitoring indicators and complex data structures, and how to extract valuable information from the big data has become an important research issue. This study takes an organic contaminated site as an example, and carries out big data analytics by using self-organizing map (SOM) and k-means algorithm to explore the correlation between each organic pollution indicator of groundwater and soil. The results show that (1) the big data analytics based on self-organizing map can rapidly mine the complicated multi-dimensional monitoring data of contaminated site, and extract key information effectively. (2) The pollution indicators in groundwater are characterized by significant clustering, and the indicators in the same cluster are of similar spatial distribution characteristics. In view of this, a screening strategy may classify the indicators first and then rank them, and can be adopted at contaminated site to reduce the number of pollution indicators detected and finally save the cost of site detection. (3) The pollution indicators in soil and groundwater also have strong spatial correlation, which is mainly due to the slow seepage velocity of groundwater. According to the correlation of the spatial distribution of pollution indicators in soil and groundwater, it is helpful to trace the pollution sources at contaminated sites.