Correlation analysis of multiple monitoring indicators of contaminated site based on self-organizing map
-
摘要:
为查明场地污染分布特征,需对场地土壤和地下水进行钻探取样,按规范的检测指标进行逐一测试。在初查和详查阶段将获得大量的土壤和地下水污染数据,数据样本数量大、监测指标多,数据结构复杂,如何从场地大数据中提取价值信息已成为研究热点。以某有机污染场地为例,基于自组织映射神经网络(SOM)和K均值算法开展大数据分析,深入探讨地下水和土壤中各污染指标间的相关性。结果表明:(1)基于自组织映射神经网络的大数据分析可快速挖掘复杂多维的污染场地监测数据,有效完成关键信息的提取;(2)地下水中污染检出指标存在显著的聚类特征,同一聚类中的污染指标具备相似的空间分布特征。对场地污染物检测采取先分类后分级的优化筛选策略,减少污染物检测指标数目,从而有效降低场地检测费用;(3)土壤和地下水中污染检出指标存在良好的空间相关性,这与该污染场地地下水渗流速度缓慢有关。土壤和地下水污染检出指标空间分布的相关性,有助于场地污染源的追溯。
Abstract:In order to investigate the distribution characteristics of pollutants at contaminated sites, it is necessary to collect soil and groundwater samples by drilling and test them by the standard procedure. In the preliminary and detailed investigation, a large amount of data of soil and groundwater pollution will be obtained. These data are often characterized by large sample size, multiple monitoring indicators and complex data structures, and how to extract valuable information from the big data has become an important research issue. This study takes an organic contaminated site as an example, and carries out big data analytics by using self-organizing map (SOM) and k-means algorithm to explore the correlation between each organic pollution indicator of groundwater and soil. The results show that (1) the big data analytics based on self-organizing map can rapidly mine the complicated multi-dimensional monitoring data of contaminated site, and extract key information effectively. (2) The pollution indicators in groundwater are characterized by significant clustering, and the indicators in the same cluster are of similar spatial distribution characteristics. In view of this, a screening strategy may classify the indicators first and then rank them, and can be adopted at contaminated site to reduce the number of pollution indicators detected and finally save the cost of site detection. (3) The pollution indicators in soil and groundwater also have strong spatial correlation, which is mainly due to the slow seepage velocity of groundwater. According to the correlation of the spatial distribution of pollution indicators in soil and groundwater, it is helpful to trace the pollution sources at contaminated sites.
-
Key words:
- self-organizing map /
- contaminated site /
- multiple monitoring indicators /
- correlation analysis /
- soil /
- groundwater
-
图 3 自组织神经网络结构图[26]
Figure 3.
表 1 地下水中污染物数据统计特征
Table 1. Statistical characteristics of pollutant data in groundwater
污染物 均值/
(μg·L−1)极大值/
(μg·L−1)标准差 极大值
高于IV类水
上限倍数邻-二甲苯 8 702.00 1010 000.00 80 780.00 1 010.00 氯苯 4 386.00 453 000.00 36 400.00 755.00 四氯化碳 363.30 33 900.00 2 691.00 678.00 1,2-二氯乙烷 108.60 10 200.00 818.10 255.00 间&对二甲苯 763.20 109 000.00 8 463.00 109.00 三氯甲烷 411.30 28 900.00 2 614.00 96.33 锰 2 056.00 61 600.00 5 840.00 41.06 1,4-二氯苯 93.02 4 200.00 463.80 7.00 1,2-二氯苯 122.50 6 000.00 622.40 3.00 1,2,4-三氯苯 2.32 206.00 18.35 1.14 三氯乙烯 0.68 57.00 4.72 0.27 四氯乙烯 3.65 288.00 23.52 0.96 砷 9.07 268.00 26.53 5.36 乙苯 90.62 4 840.00 546.10 8.06 甲苯 144.30 5 150.00 684.60 3.67 苯 5.64 157.00 18.85 1.31 二硫化碳 1.16 102.00 8.79 1.02 2,4-二氯酚 1.25 88.30 9.33 / 2,6-二氯酚 0.17 20.10 1.59 / 1,2,3-三氯苯 0.82 59.30 5.61 0.33 1,3-二氯苯 17.90 911.00 95.78 / 溴苯 1.23 30.50 4.07 / 2-氯甲苯 7.21 239.00 29.90 / 1-萘胺 629.40 58 600.00 4 816.00 / 4-氯甲苯 68.27 6 550.00 604.80 / 异丙基苯 3.81 485.00 37.63 / 1,3,5-三甲苯 0.06 4.80 0.51 / 丙酮 8.08 1 350.00 104.50 / 4-甲基-2-戊酮 0.68 62.00 6.24 / 注:“/”表示非《地下水质量标准》(GB/T 14848—2017)要求控制指标。 表 2 地下水中污染物聚类分级优化筛选结果
Table 2. Clustering optimization results of pollutants in groundwater
聚类 污染物 极大值高于Ⅳ类水上限倍数 Cluster-1 氯苯 755.00 1,2-二氯乙烷 255.00 2-氯甲苯 / 1-萘胺 / Cluster-2 1,4-二氯苯 7.00 1,2-二氯苯 3.00 1,2,4-三氯苯 1.14 2,4-二氯酚 / 2,6-二氯酚 / 1,2,3-三氯苯 0.33 1,3-二氯苯 / 溴苯 / Cluster-3 邻-二甲苯 1 010.00 间&对二甲苯 109.00 三氯甲烷 96.33 锰 41.06 砷 5.36 乙苯 8.06 甲苯 3.67 苯 1.31 二硫化碳 1.02 4-氯甲苯 / 异丙基苯 / 1,3,5-三甲苯 / 丙酮 / 4-甲基-2-戊酮 / Cluster-4 四氯化碳 678.00 三氯乙烯 0.27 四氯乙烯 0.96 表 3 地下水和土壤数据统计特征
Table 3. Statistical characteristics of groundwater and soil data
污染物 地下水 土壤 均值/(μg·L−1) 极大值/(μg·L−1) 标准差 均值/(mg·kg−1) 极大值/(mg·kg−1) 标准差 二甲苯 7 790.33 995 700.00 78 790.33 6.28 236.00 38.27 氯苯 5 433.29 453 000.00 32 256.93 208.70 7 890.00 1 279.75 四氯化碳 311.25 35 450.00 2 077.31 0.79 12.50 2.65 三氯甲烷 527.11 27 680.00 1 701.52 0.50 8.15 1.49 1,2-二氯乙烷 548.30 10 200.00 902.40 0.01 0.34 0.06 1,2-二氯苯 145.09 6 000.00 599.40 25.87 615.00 110.86 甲苯 214.00 4 970.00 409.60 1.73 48.20 8.05 苯 9.08 157.00 121.20 0.08 1.21 0.24 异丙基苯 8.06 100.00 16.05 0.07 1.05 0.22 -
[1] 陈梦舫. 我国工业污染场地土壤与地下水重金属修复技术综述[J]. 中国科学院院刊,2014,29(3):327 − 335. [CHEN Mengfang. Review on heavy metal remediation technology of soil and groundwater at industrially contaminated site in China[J]. Bulletin of Chinese Academy of Sciences,2014,29(3):327 − 335. (in Chinese with English abstract)
[2] 吕永高, 蔡五田, 杨骊, 等. 中试尺度下可渗透反应墙位置优化模拟—以铬污染地下水场地为例[J]. 水文地质工程地质,2020,47(5):189 − 195. [LYU Yonggao, CAI Wutian, YANG Li, et al. A numerical simulation study of the position optimization of a pilot-scale permeable reactive barrier: a case study of the hexavalent chromium contaminated site[J]. Hydrogeology & Engineering Geology,2020,47(5):189 − 195. (in Chinese with English abstract)
[3] 郭琼泽, 施小清, 王慧婷, 等. 井间分溶示踪估计重非水相污染物残留量的影响因素数值分析[J]. 水文地质工程地质,2019,46(6):165 − 172. [GUO Qiongze, SHI Xiaoqing, WANG Huiting, et al. Numerical analysis of the influencing factors for estimating DNAPL residual by the partitioning interwell tracer tests[J]. Hydrogeology & Engineering Geology,2019,46(6):165 − 172. (in Chinese with English abstract)
[4] 骆永明. 中国污染场地修复的研究进展、问题与展望[J]. 环境监测管理与技术,2011,23(3):1 − 6. [LUO Yongming. Contaminated site remediation in China: progresses, problems and prospects[J]. The Administration and Technique of Environmental Monitoring,2011,23(3):1 − 6. (in Chinese with English abstract) doi: 10.3969/j.issn.1006-2009.2011.03.002
[5] 史健婷. 多维环境与地球化学数据分析与可视化研究[D].广州: 中国科学院广州地球化学研究所, 2017.
SHI Jianting. Statistical analysis and visualization for multidimensional environmental and geochemical data[D]. Guangzhou: Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, 2017. (in Chinese with English abstract)
[6] 姜光辉, 李红春, 郭芳. 地下水污染场地水质空间相关性分析[J]. 水文地质工程地质,2017,44(2):137 − 143. [JIANG Guanghui, LI Hongchun, GUO Fang. Spatial variability of multi-tracers in groundwater contamination sites[J]. Hydrogeology & Engineering Geology,2017,44(2):137 − 143. (in Chinese with English abstract)
[7] 王玉玲, 王蒙, 闫岩, 等. 基于聚类算法的ERT污染区域识别方法[J]. 中国环境科学,2019,39(3):1315 − 1322. [WANG Yuling, WANG Meng, YAN Yan, et al. An ERT pollution area identification method based on clustering algorithm[J]. China Environmental Science,2019,39(3):1315 − 1322. (in Chinese with English abstract) doi: 10.3969/j.issn.1000-6923.2019.03.050
[8] 安永龙, 黄勇, 孙朝, 等. 北京通州某改造区土壤中PAHs的来源分析及风险评价[J]. 水文地质工程地质,2017,44(5):112 − 120. [AN Yonglong, HUANG Yong, SUN Zhao, et al. Source apportionment and risk assessment of PAHs in soil from a renewal area in the Tongzhou District of Beijing[J]. Hydrogeology & Engineering Geology,2017,44(5):112 − 120. (in Chinese with English abstract)
[9] 陈洁, 施维林, 张一梅, 等. 电镀厂遗留场地污染分析及健康风险空间分布评价[J]. 环境工程,2018,36(4):153 − 159. [CHEN Jie, SHI Weilin, ZHANG Yimei, et al. Pollution analysis and spatial distribution of health risk in electroplating abandoned site[J]. Environmental Engineering,2018,36(4):153 − 159. (in Chinese with English abstract)
[10] TAO H, LIAO X Y, ZHAO D, et al. Delineation of soil contaminant plumes at a co-contaminated site using BP neural networks and geostatistics[J]. Geoderma,2019,354:113878. doi: 10.1016/j.geoderma.2019.07.036
[11] OLAWOYIN R, NIETO A, GRAYSON R L, et al. Application of artificial neural network (ANN)-self-organizing map (SOM) for the categorization of water, soil and sediment quality in petrochemical regions[J]. Expert Systems with Applications,2013,40(9):3634 − 3648. doi: 10.1016/j.eswa.2012.12.069
[12] LEE K J, YUN S T, YU S, et al. The combined use of self-organizing map technique and fuzzy c-means clustering to evaluate urban groundwater quality in Seoul metropolitan City, South Korea[J]. Journal of Hydrology,2019,569:685 − 697. doi: 10.1016/j.jhydrol.2018.12.031
[13] ASTEL A, TSAKOVSKI S, BARBIERI P, et al. Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets[J]. Water Research,2007,41(19):4566 − 4578. doi: 10.1016/j.watres.2007.06.030
[14] 林海明, 杜子芳. 主成分分析综合评价应该注意的问题[J]. 统计研究,2013,30(8):25 − 31. [LIN Haiming, DU Zifang. Some problems in comprehensive evaluation in the principal component analysis[J]. Statistical Research,2013,30(8):25 − 31. (in Chinese with English abstract) doi: 10.3969/j.issn.1002-4565.2013.08.004
[15] LISCHEID G. Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment[J]. Stochastic Environmental Research and Risk Assessment,2009,23(7):977 − 990. doi: 10.1007/s00477-008-0266-y
[16] KOHONEN T. Essentials of the self-organizing map[J]. Neural Networks,2013,37:52 − 65. doi: 10.1016/j.neunet.2012.09.018
[17] KOHONEN T. Exploration of very large databases by self-organizing maps[C]//Proceedings of International Conference on Neural Networks (ICNN'97). June 12-12, 1997, Houston, TX, USA. IEEE, 1997: PL1-PL6.
[18] KALTEH A M, HJORTH P, BERNDTSSON R. Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application[J]. Environmental Modelling & Software,2008,23(7):835 − 845.
[19] CHEN I T, CHANG L C, CHANG F J. Exploring the spatio-temporal interrelation between groundwater and surface water by using the self-organizing maps[J]. Journal of Hydrology,2018,556:131 − 142. doi: 10.1016/j.jhydrol.2017.10.015
[20] UNDERWOOD K L, RIZZO D M, SCHROTH A W, et al. Evaluating spatial variability in sediment and phosphorus concentration-discharge relationships using Bayesian inference and self-organizing maps[J]. Water Resources Research,2017,53(12):10293 − 10316. doi: 10.1002/2017WR021353
[21] JAMPANI M, HUELSMANN S, LIEDL R, et al. Spatio-temporal distribution and chemical characterization of groundwater quality of a wastewater irrigated system: a case study[J]. Science of the Total Environment,2018,636:1089 − 1098. doi: 10.1016/j.scitotenv.2018.04.347
[22] DAI L J, WANG L Q, LI L F, et al. Multivariate geostatistical analysis and source identification of heavy metals in the sediment of Poyang Lake in China[J]. Science of the Total Environment,2018,621:1433 − 1444. doi: 10.1016/j.scitotenv.2017.10.085
[23] LIAO X Y, TAO H, GONG X G, et al. Exploring the database of a soil environmental survey using a geo-self-organizing map: a pilot study[J]. Journal of Geographical Sciences,2019,29(10):1610 − 1624. doi: 10.1007/s11442-019-1644-8
[24] MELO D S, GONTIJO E S J, FRASCARELI D, et al. Self-organizing maps for evaluation of biogeochemical processes and temporal variations in water quality of subtropical reservoirs[J]. Water Resources Research,2019,55(12):10268 − 10281. doi: 10.1029/2019WR025991
[25] KIM K H, YUN S T, YU S, et al. Geochemical pattern recognitions of deep thermal groundwater in South Korea using self-organizing map: Identified pathways of geochemical reaction and mixing[J]. Journal of Hydrology,2020,589:125202. doi: 10.1016/j.jhydrol.2020.125202
[26] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
ZHOU Zhihua. Machine learning[M]. Beijing: Tsinghua University Press, 2016.(in Chinese)
[27] LI T, SUN G H, YANG C P, et al. Using self-organizing map for coastal water quality classification: Towards a better understanding of patterns and processes[J]. Science of the Total Environment,2018,628/629:1446 − 1459. doi: 10.1016/j.scitotenv.2018.02.163
[28] CHOI B Y, YUN S T, KIM K H, et al. Hydrogeochemical interpretation of South Korean groundwater monitoring data using Self-Organizing Maps[J]. Journal of Geochemical Exploration,2014,137:73 − 84. doi: 10.1016/j.gexplo.2013.12.001
[29] 中华人民共和国环境保护部. 场地环境调查技术导则: HJ 25.1—2014[S]. 北京: 中国环境科学出版社, 2014.
Ministry of Environmental Protection of the People's Republic of China. Technical guidelines for environmental site investigation: HJ 25.1—2014[S]. Beijing: China Environment Science Press, 2014. (in Chinese)
[30] 中华人民共和国环境保护部. 场地环境监测技术导则: HJ 25.2—2014[S]. 北京: 中国环境科学出版社, 2014.
Ministry of Environmental Protection of the People's Republic of China. Technical guidelines for environmental site monitoring: HJ 25.2—2014[S]. Beijing: China Environment Science Press, 2014. (in Chinese)
[31] 中华人民共和国环境保护部. 土壤环境监测技术规范: HJ/T 166—2004 [S]. 北京: 中国环境科学出版社, 2004.
Ministry of Environmental Protection the People's Republic of China. Technical specification for soil environmental monitoring: HJ/T 166-2004[S]. Beijing: China Environment Science Press, 2014.(in Chinese)
[32] 中华人民共和国环境保护部. 地下水环境监测技术规范: HJ/T 164—2004 [S]. 北京: 中国环境科学出版社, 2004.
Ministry of Environmental Protection the People's republic of China. Technical specifications for environmental monitoring of groundwater: HJ/T 166—2004 [S]. Beijing: China Environment Science Press, 2004.(in Chinese)
[33] VESANTO J, ALHONIEMI E. Clustering of the self-organizing map[J]. IEEE Transactions on Neural Networks,2000,11(3):586 − 600. doi: 10.1109/72.846731
[34] LIKAS A, VLASSIS N, VERBEEK J J. The global k-means clustering algorithm[J]. Pattern recognition,2003,36(2):451 − 461. doi: 10.1016/S0031-3203(02)00060-2