Effect of sample selection on the susceptibility assessment of geological hazards: A case study in Liulin County, Shanxi Province
-
摘要:
非地质灾害样本的合理选取对地质灾害易发性预测准确度的提高具有重要意义。文章以柳林县为例,选取适宜的影响因子,基于GIS技术采用随机森林模型进行易发性评价。以地质灾害与非地质灾害比例为1∶1、1∶1.5、1∶3、1∶5、1∶10和非地质灾害点距已知灾害点100,500,800,1000 m为选取条件交叉结合共创建20组模型进行分析。结果表明:(1)通过误差指标、混淆矩阵和ROC曲线检验,样本比例和距已知灾害点距离变化对地质灾害易发性评价结果有较大影响。随着样本比例变小,距已知灾害点距离增加,各模型平均绝对误差和均方根误差整体下降,准确率整体上升。各模型ROC曲线下面积值均大于0.8,均有较好的预测效果。当样本比例小于1∶3时,距已知灾害点距离增加对模型误差和准确率影响较小,变化趋于稳定。综合判断样本比例为1∶10、距已知灾害点1000 m为最适合研究区模型。(2)高和极高易发区主要分布在中部及北部道路和河流两侧的地区,是柳林县防灾减灾的重点区。(3)样本选取差异导致易发性结果不同主要是因为建模过程中随机森林模型对数据特征的采集及判断发生变化,样本是否具有代表性发生变化。这些研究成果对当防灾减灾工作的实施具有重要意义。
Abstract:The rational selection of non-geological hazard samples is of great significance to improve the accuracy of geological hazard susceptibility prediction. This study uses Liulin County as a case study, where appropriate impact factors were selected, and the random forest (RF) model was employed for susceptibility assessment based on GIS technology. A total of twenty sets of models were created by varying the ratio of geological hazard to non-geological hazard points (1∶1, 1∶1.5, 1∶3, 1∶5 and 1∶10) and the distance from non-geological hazard points to known hazard points (100,500,800,1000 m). The results demonstrate that: (1) Through error index, confusion matrix, and ROC curve tests, the sample proportion and distance from the known hazard point significantly influenced the geological hazard susceptibility evaluation. As the sample proportion decreased and the distance from known hazard points increased, the overall MAE and RMSE of the models decreased, while the overall ACC increased. All models achieved AUC value greater than 0.8, indicating excellent predictive performance. When the sample proportion was less than 1∶3, the increasing distance from the known hazard points on model error and accuracy became less pronounced, stabilizing the results. The most suitable model for the study area was found to have a sample ratio of 1∶10 and a distance of 1000 m from known hazard points. (2) High and very high susceptibility areas were primarily located in the central and northern regions, adjacent to roads and rivers, making them key areas for hazard prevention and reduction in Liulin County. (3) Differences in sample selection led to varying susceptibility results mainly due to changes in the RF model's data feature collection and judgment during the modeling process, as well as the representativeness of the samples. These research findings hold significant implications for the implementation of hazard prevention and reduction measures.
-
Key words:
- non-geological hazard /
- GIS /
- random forest /
- susceptibility /
- error /
- confusion matrix /
- ROC
-
-
表 1 MAE和RMSE值
Table 1. MAE and RMSE values
距离 误差统计指标 1∶1 1∶1.5 1∶3 1∶5 1∶10 计算数据 距已知灾害点100 m MAE 0.279 0.285 0.275 0.196 0.136 RMSE 0.373 0.367 0.408 0.315 0.267 距已知灾害点500 m MAE 0.332 0.304 0.270 0.205 0.130 RMSE 0.410 0.401 0.393 0.322 0.260 距已知灾害点800 m MAE 0.323 0.279 0.264 0.181 0.135 RMSE 0.414 0.361 0.385 0.280 0.269 距已知灾害点1000 m MAE 0.281 0.254 0.267 0.188 0.129 RMSE 0.368 0.337 0.388 0.302 0.258 表 2 混淆矩阵
Table 2. Summary table of confusion matrix
地质灾害与非地质灾害样本比例1∶1 距已知灾害点100 m 真实值/个 距已知灾害点500 m 真实值/个 地质灾害 非地质灾害 地质灾害 非地质灾害 预测值 地质灾害 34 7 预测值 地质灾害 26 15 非地质灾害 11 39 非地质灾害 11 39 距已知灾害点800 m 真实值/个 距已知灾害点1000 m 真实值/个 地质灾害 非地质灾害 地质灾害 非地质灾害 预测值 地质灾害 31 10 预测值 地质灾害 35 6 非地质灾害 13 37 非地质灾害 10 40 ...... 地质灾害与非地质灾害样本比例1∶10 距已知灾害点800 m 真实值/个 距已知灾害点1000 m 真实值/个 地质灾害 非地质灾害 地质灾害 非地质灾害 预测值 地质灾害 435 13 预测值 地质灾害 442 6 非地质灾害 38 13 非地质灾害 41 10 表 3 ACC值表
Table 3. Summary table of ACC values
距离 1∶1
占比/%1∶1.5
占比/%1∶3
占比/%1∶5
占比/%1∶10
占比/%距已知灾害点100 m 80.2 79.8 78.0 86.0 90.1 距已知灾害点500 m 71.4 77.2 78.0 86.0 91.4 距已知灾害点800 m 74.7 80.7 79.7 87.9 89.8 距已知灾害点1000 m 82.4 82.5 80.8 89.3 92.3 -
[1] ARABAMERI A,YAMANI M,PRADHAN B,et al. Novel ensembles of COPRAS multi-criteria decision-making with logistic regression,boosted regression tree,and random forest for spatial prediction of gully erosion susceptibility[J]. Science of the Total Environment,2019,688:903 − 916. doi: 10.1016/j.scitotenv.2019.06.205
[2] UITTO J I,SHAW R. Sustainable development and disaster risk reduction:Introduction[M]//Sustainable Development and Disaster Risk Reduction. Tokyo:Springer,2016:1 − 12.
[3] JIANG Weiguo,RAO Pingzeng,CAO Ran,et al. Comparative evaluation of geological disaster susceptibility using multi-regression methods and spatial accuracy validation[J]. Journal of Geographical Sciences,2017,27(4):439 − 462. doi: 10.1007/s11442-017-1386-4
[4] PAWLUSZEK-FILIPIAK K,OREŃCZAK N,PASTERNAK M. Investigating the effect of cross-modeling in landslide susceptibility mapping[J]. Applied Sciences,2020,10(18):6335. doi: 10.3390/app10186335
[5] 申怀飞,董雨,杨梅,等. 基于AHP与信息量法的甘肃省滑坡易发性评估[J]. 水土保持研究,2021,28(6):412 − 419. [SHEN Huaifei,DONG Yu,YANG Mei,et al. Assessment on landslide susceptibility in Gansu Province based on AHP and information quantity method[J]. Research of Soil and Water Conservation,2021,28(6):412 − 419. (in Chinese with English abstract)]
SHEN Huaifei, DONG Yu, YANG Mei, et al. Assessment on landslide susceptibility in Gansu Province based on AHP and information quantity method[J]. Research of Soil and Water Conservation, 2021, 28(6): 412 − 419. (in Chinese with English abstract)
[6] KANUNGO D P,ARORA M K,SARKAR S,et al. A comparative study of conventional,ANN black box,fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas[J]. Engineering Geology,2006,85(3/4):347 − 366.
[7] 孙滨,祝传兵,康晓波,等. 基于信息量模型的云南东川泥石流易发性评价[J]. 中国地质灾害与防治学报,2022,33(5):119 − 127. [SUN Bin,ZHU Chuanbing,KANG Xiaobo,et al. Susceptibility assessment of debris flows based on information model in Dongchuan,Yunnan Province[J]. The Chinese Journal of Geological Hazard and Control,2022,33(5):119 − 127. (in Chinese with English abstract)]
SUN Bin, ZHU Chuanbing, KANG Xiaobo, et al. Susceptibility assessment of debris flows based on information model in Dongchuan, Yunnan Province[J]. The Chinese Journal of Geological Hazard and Control, 2022, 33(5): 119 − 127. (in Chinese with English abstract)
[8] 熊小辉,汪长林,白永健,等. 基于不同耦合模型的县域滑坡易发性评价对比分析——以四川普格县为例[J]. 中国地质灾害与防治学报,2022,33(4):114 − 124. [XIONG Xiaohui,WANG Changlin,BAI Yongjian,et al. Comparison of landslide susceptibility assessment based on multiple hybrid models at County level:A case study for Puge County,Sichuan Province[J]. The Chinese Journal of Geological Hazard and Control,2022,33(4):114 − 124. (in Chinese with English abstract)] doi: 10.16031/j.cnki.issn.1003-8035.202202052
XIONG Xiaohui, WANG Changlin, BAI Yongjian, et al. Comparison of landslide susceptibility assessment based on multiple hybrid models at County level: A case study for Puge County, Sichuan Province[J]. The Chinese Journal of Geological Hazard and Control, 2022, 33(4): 114 − 124. (in Chinese with English abstract) doi: 10.16031/j.cnki.issn.1003-8035.202202052
[9] 吴常润,角媛梅,王金亮,等. 基于频率比-逻辑回归耦合模型的双柏县滑坡易发性评价[J]. 自然灾害学报,2021,30(4):213 − 224. [WU Changrun,JIAO Yuanmei,WANG Jinliang,et al. Frequency ratio and logistic regression models based coupling analysis for susceptibility of landslide in Shuangbai County[J]. Journal of Natural Disasters,2021,30(4):213 − 224. (in Chinese with English abstract)] doi: 10.13577/j.jnd.2021.0423
WU Changrun, JIAO Yuanmei, WANG Jinliang, et al. Frequency ratio and logistic regression models based coupling analysis for susceptibility of landslide in Shuangbai County[J]. Journal of Natural Disasters, 2021, 30(4): 213 − 224. (in Chinese with English abstract) doi: 10.13577/j.jnd.2021.0423
[10] 杜国梁,杨志华,袁颖,等. 基于逻辑回归-信息量的川藏交通廊道滑坡易发性评价[J]. 水文地质工程地质,2021,48(5):102 − 111. [DU Guoliang,YANG Zhihua,YUAN Ying,et al. Landslide susceptibility mapping in the Sichuan-Tibet traffic corridor using logistic regression-information value method[J]. Hydrogeology & Engineering Geology,2021,48(5):102 − 111. (in Chinese with English abstract)]
DU Guoliang, YANG Zhihua, YUAN Ying, et al. Landslide susceptibility mapping in the Sichuan-Tibet traffic corridor using logistic regression-information value method[J]. Hydrogeology & Engineering Geology, 2021, 48(5): 102 − 111. (in Chinese with English abstract)
[11] 郭飞,王秀娟,陈玺,等. 基于不同模型的赣南地区小型削方滑坡易发性评价对比分析[J]. 中国地质灾害与防治学报,2022,33(6):125 − 133. [GUO Fei,WANG Xiujuan,CHEN Xi,et al. Comparative analyses on susceptibility of cutting slope landslides in southern Jiangxi using different models[J]. The Chinese Journal of Geological Hazard and Control,2022,33(6):125 − 133. (in Chinese with English abstract)]
GUO Fei, WANG Xiujuan, CHEN Xi, et al. Comparative analyses on susceptibility of cutting slope landslides in southern Jiangxi using different models[J]. The Chinese Journal of Geological Hazard and Control, 2022, 33(6): 125 − 133. (in Chinese with English abstract)
[12] 黄发明,胡松雁,闫学涯,等. 基于机器学习的滑坡易发性预测建模及其主控因子识别[J]. 地质科技通报,2022,41(2):79 − 90. [HUANG Faming,HU Songyan,YAN Xueya,et al. Landslide susceptibility prediction and identification of its main environmental factors based on machine learning models[J]. Bulletin of Geological Science and Technology,2022,41(2):79 − 90. (in Chinese with English abstract)]
HUANG Faming, HU Songyan, YAN Xueya, et al. Landslide susceptibility prediction and identification of its main environmental factors based on machine learning models[J]. Bulletin of Geological Science and Technology, 2022, 41(2): 79 − 90. (in Chinese with English abstract)
[13] 何书,鲜木斯艳·阿布迪克依木,胡萌,等. 基于自组织特征映射网络-随机森林模型的滑坡易发性评价——以江西大余县为例[J]. 中国地质灾害与防治学报,2022,33(1):132 − 140. [HE Shu,ABUDIKEYIMU XMSY,HU Meng,et al. Evaluation on landslide susceptibility based on self-organizing feature map network and random forest model:A case study of Dayu County of Jiangxi Province[J]. The Chinese Journal of Geological Hazard and Control,2022,33(1):132 − 140. (in Chinese with English abstract)] doi: 10.16031/j.cnki.issn.1003-8035.2022.01-16
HE Shu, ABUDIKEYIMU XMSY, HU Meng, et al. Evaluation on landslide susceptibility based on self-organizing feature map network and random forest model: A case study of Dayu County of Jiangxi Province[J]. The Chinese Journal of Geological Hazard and Control, 2022, 33(1): 132 − 140. (in Chinese with English abstract) doi: 10.16031/j.cnki.issn.1003-8035.2022.01-16
[14] GE G,SHI Zhongjie,ZHU Yuanjun,et al. Land use/cover classification in an arid desert-oasis mosaic landscape of China using remote sensed imagery:Performance assessment of four machine learning algorithms[J]. Global Ecology and Conservation,2020,22:e00971. doi: 10.1016/j.gecco.2020.e00971
[15] 吉日伍呷,田宏岭,韩继冲. 基于不同机器学习算法的地震滑坡易发性评价——以鲁甸地震为例[J]. 昆明理工大学学报(自然科学版),2022,47(2):47 − 56. [JI R,TIAN Hongling,HAN Jichong. Evaluation of the susceptibility of earthquake landslides based on different machine learning algorithms:Taking Ludian earthquake as an example[J]. Journal of Kunming University of Science and Technology (Natural Science),2022,47(2):47 − 56. (in Chinese with English abstract)]
JI R, TIAN Hongling, HAN Jichong. Evaluation of the susceptibility of earthquake landslides based on different machine learning algorithms: Taking Ludian earthquake as an example[J]. Journal of Kunming University of Science and Technology (Natural Science), 2022, 47(2): 47 − 56. (in Chinese with English abstract)
[16] 李坤,赵俊三,林伊琳,等. 基于RF和SVM模型的东川泥石流易发性评价研究[J]. 云南大学学报(自然科学版),2022,44(1):107 − 115. [LI Kun,ZHAO Junsan,LIN Yilin,et al. Assessment of debris flow susceptibility in Dongchuan based on RF and SVM models[J]. Journal of Yunnan University (Natural Sciences Edition),2022,44(1):107 − 115. (in Chinese with English abstract)]
LI Kun, ZHAO Junsan, LIN Yilin, et al. Assessment of debris flow susceptibility in Dongchuan based on RF and SVM models[J]. Journal of Yunnan University (Natural Sciences Edition), 2022, 44(1): 107 − 115. (in Chinese with English abstract)
[17] 邱维蓉,吴帮玉,潘学树,等. 几种聚类优化的机器学习方法在灵台县滑坡易发性评价中的应用[J]. 西北地质,2020,53(1):222 − 233. [QIU Weirong,WU Bangyu,PAN Xueshu,et al. Application of several cluster-optimization-based machine learning methods in evaluation of landslide susceptibility in Lingtai County[J]. Northwestern Geology,2020,53(1):222 − 233. (in Chinese with English abstract)] doi: 10.19751/j.cnki.61-1149/p.2020.01.021
QIU Weirong, WU Bangyu, PAN Xueshu, et al. Application of several cluster-optimization-based machine learning methods in evaluation of landslide susceptibility in Lingtai County[J]. Northwestern Geology, 2020, 53(1): 222 − 233. (in Chinese with English abstract) doi: 10.19751/j.cnki.61-1149/p.2020.01.021
[18] DOU Jie,YUNUS A P,MERGHADI A,et al. Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning[J]. Science of the Total Environment,2020,720:137320. doi: 10.1016/j.scitotenv.2020.137320
[19] DOU Jie,YUNUS A P,TIEN BUI D,et al. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island,Japan[J]. Science of the Total Environment,2019,662:332 − 346. doi: 10.1016/j.scitotenv.2019.01.221
[20] BEHNIA P,BLAIS-STEVENS A. Landslide susceptibility modelling using the quantitative random forest method along the northern portion of the Yukon Alaska Highway Corridor,Canada[J]. Natural Hazards,2018,90(3):1407 − 1426. doi: 10.1007/s11069-017-3104-z
[21] WANG Yue,SUN Deliang,WEN Haijia,et al. Comparison of random forest model and frequency ratio model for landslide susceptibility mapping (LSM) in Yunyang County (Chongqing,China)[J]. International Journal of Environmental Research and Public Health,2020,17(12):4206. doi: 10.3390/ijerph17124206
[22] YI Yaning,ZHANG Zhijie,ZHANG Wanchang,et al. Landslide susceptibility mapping using multiscale sampling strategy and convolutional neural network:A case study in Jiuzhaigou region[J]. CATENA,2020,195:104851. doi: 10.1016/j.catena.2020.104851
[23] BREIMAN L. Random forests[J]. Machine Language,2001,45(1):5 − 32.
[24] 於佳宁,刘凯,张冰玥,等.中国区域TanDEM-X 90 m DEM高程精度评价及其适用性分析[J].地球信息科学学报,2021,23(04):646-657. [YU Jianing,LIU Kai,ZHANG Bingyue,et al. Vertical accuracy assessment and applicability analysis of TanDEM-X 90 m DEM in China[J]. Journal of Geo-information Science,2021,23(4):646 − 657.(in Chinese with English abstract)]
YU Jianing, LIU Kai, ZHANG Bingyue, et al. Vertical accuracy assessment and applicability analysis of TanDEM-X 90 m DEM in China[J]. Journal of Geo-information Science, 2021, 23(4): 646 − 657.(in Chinese with English abstract)
[25] OSNA T,SEZER E A,AKGUN A. GeoFIS:An integrated tool for the assessment of landslide susceptibility[J]. Computers & Geosciences,2014,66:20 − 30.
[26] 张玘恺,凌斯祥,李晓宁,等. 九寨沟县滑坡灾害易发性快速评估模型对比研究[J]. 岩石力学与工程学报,2020,39(8):1595 − 1610. [ZHANG Qikai,LING Sixiang,LI Xiaoning,et al. Comparison of landslide susceptibility mapping rapid assessment models in Jiuzhaigou County,Sichuan Province,China[J]. Chinese Journal of Rock Mechanics and Engineering,2020,39(8):1595 − 1610. (in Chinese with English abstract)] doi: 10.13722/j.cnki.jrme.2020.0029
ZHANG Qikai, LING Sixiang, LI Xiaoning, et al. Comparison of landslide susceptibility mapping rapid assessment models in Jiuzhaigou County, Sichuan Province, China[J]. Chinese Journal of Rock Mechanics and Engineering, 2020, 39(8): 1595 − 1610. (in Chinese with English abstract) doi: 10.13722/j.cnki.jrme.2020.0029
[27] VAKHSHOORI V,POURGHASEMI H R,ZARE M,et al. Landslide susceptibility mapping using GIS-based data mining algorithms[J]. Water,2019,11(11):2292. doi: 10.3390/w11112292
[28] 段宇英,汤军,刘远刚,等. 基于随机森林的山西省柳林县黄土滑坡空间敏感性评价[J]. 地理科学,2022,42(2):343 − 351. [DUAN Yuying,TANG Jun,LIU Yuangang,et al. Spatial sensitivity evaluation of loess landslide in Liulin County,Shanxi based on random forest[J]. Scientia Geographica Sinica,2022,42(2):343 − 351. (in Chinese with English abstract)] doi: 10.13249/j.cnki.sgs.2022.02.016
DUAN Yuying, TANG Jun, LIU Yuangang, et al. Spatial sensitivity evaluation of loess landslide in Liulin County, Shanxi based on random forest[J]. Scientia Geographica Sinica, 2022, 42(2): 343 − 351. (in Chinese with English abstract) doi: 10.13249/j.cnki.sgs.2022.02.016
[29] 吴润泽,胡旭东,梅红波,等. 基于随机森林的滑坡空间易发性评价——以三峡库区湖北段为例[J]. 地球科学,2021,46(1):321 − 330. [WU Runze,HU Xudong,MEI Hongbo,et al. Spatial susceptibility assessment of landslides based on random forest:A case study from Hubei section in the Three Gorges Reservoir area[J]. Earth Science,2021,46(1):321 − 330. (in Chinese with English abstract)]
WU Runze, HU Xudong, MEI Hongbo, et al. Spatial susceptibility assessment of landslides based on random forest: A case study from Hubei section in the Three Gorges Reservoir area[J]. Earth Science, 2021, 46(1): 321 − 330. (in Chinese with English abstract)
[30] TSANGARATOS P,ILIA I. Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments:The influence of models complexity and training dataset size[J]. CATENA,2016,145:164 − 179. doi: 10.1016/j.catena.2016.06.004
[31] SHIRZADI A,SOLAIMANI K,ROSHAN M H,et al. Uncertainties of prediction accuracy in shallow landslide modeling:Sample size and raster resolution[J]. CATENA,2019,178:172 − 188. doi: 10.1016/j.catena.2019.03.017
-