Landslide susceptibility mapping model based on a coupled model of SMOTE-Tomek and CNN and its application: A case study in the Zigui-Badong section of the Three Gorges Reservoir area
-
摘要:
中国是受滑坡灾害影响较为严重的国家,滑坡对受灾害影响地区的人民生命与财产造成了巨大的威胁。滑坡易发性评价作为对滑坡风险预测的重要工具,具有重要的防灾减灾的意义,但是传统的滑坡易发性评价中存在滑坡与非滑坡样本数据不平衡的问题,使得训练集的建立在本质上是对非滑坡数据进行了欠采样,导致滑坡事件的重要信息特征丢失,进而影响到滑坡易发性评价的可靠性。文章以三峡库区巴东至秭归段为例,选取高程、坡度等14个评价因子作为滑坡易发性评价因子,划分原始训练集与验证集,采用SMOTE-Tomek方法(synthetic minority oversampling technique-Tomek Links,SMOTE-Tomek)处理原始训练数据集,构建输入训练集,输入并训练卷积神经网络模型(convolutional neural networks,CNN),得到SMOTE-Tomek-CNN耦合模型,再通过将SMOTE-Tomek方法与传统的欠采样方法(random undersampling, RUS),分别与CNN模型和支持向量机模型(support vector machine, SVM)交叉组合成SMOTE-Tomek-SVM、RUS-CNN和RUS-SVM三种耦合模型,并与SMOTE-CNN耦合模型进行对比。结果表明,在四种耦合模型中,SMOTE-CNN耦合模型的特定类别精度与ROC曲线下面积较高,结果分别为73.60%和0.965,表明该方法的预测能力优于传统的方法,能为研究区滑坡预测工作提供可靠参考。
-
关键词:
- 滑坡 /
- 滑坡易发性评价 /
- SMOTE-Tomek /
- 卷积神经网络 /
- 不平衡数据
Abstract:China is a nation severely impacted by landslide disasters, which poses a great threat to the lives and properties of people in the disaster-affected areas. Landslide susceptibility assessment, as an important tool for landslide risk prediction, is of great significance for disaster mitigation and prevention. However, traditional landslide susceptibility assessment faces the issue of imbalanced data between landslide and non-landslide samples, leading to the inherent undersampling of non-landslide data in the training set. This results in the loss of important information features related to landslide events, thereby affecting the reliability of landslide susceptibility assessment. In this study, using the Zigui-Badong section of the Three Gorges Reservoir Area as an example, 14 evaluation factors, such as elevation and slope were chosen as landslide susceptibility assessment factors, and the original training set and the validation set were divided. In this study, the synthetic minority oversampling technique - Tomek Links (SMOTE-Tomek) method was employed to process the original training dataset, construct the input training set. A convolutional neural networks (CNN) was then trained using this input data, resulting in the SMOTE-Tomek-CNN coupling model. In addition, by intersecting the SMOTE-Tomek method with undersampling methods (random undersampling, RUS), they were separately coupled with the CNN model and support vector machine model (SVM) to form three coupled models: SMOTE-Tomek-SVM, RUS-CNN, and RUS-SVM. These were compared with the SMOTE-CNN coupled model. The results indicate that, among the four coupling models, the SMOTE-CNN coupled model has higher specific class accuracy and area under the ROC curve, with values of 73.60% and 0.965, respectively. This indicates that this method's predictive ability is superior to that of traditional methods, making it a reliable resource for landslide prediction in the studied area.
-
表 1 14个因子多重共线性分析
Table 1. Multicollinearity analysis of 14 factors
因子 TOL VIF 因子 TOL VIF 高程 0.363 2.758 岩性 0.776 1.289 坡向 0.971 1.030 距长江距离 0.388 2.578 坡度 0.118 8.454 地形湿度指数 0.838 1.193 坡长 0.631 1.586 年平均降雨量 0.511 1.957 地形表面纹理 0.887 1.274 土地利用类型 0.862 1.160 地形起伏度指数 0.114 8.734 归一化植被指数 0.750 1.334 距断层距离 0.818 1.223 距道路距离 0.513 1.951 表 2 选取的滑坡易发性评价因子
Table 2. The selected factors for the landslide susceptibility assessment
因子 分级 因子 分级 因子 分级 高程/m <400 地势起伏度指数 0~35 土地利用类型 水体 400~800 35~70 森林 800~1200 70~105 人工覆盖面 1200~1600 105~140 草地 >1600 >140 农业用地 坡向 平地 距断层距离/m 0~1500 归一化植被指数 <0.075 正北 1500~3000 0.075~0.15 北东 3000~4500 0.15~0.225 正东 4500~6000 0.225~0.3 南东 6000~7500 0.3~0.375 正南 >7500 >0.375 西南 岩性 硬岩 距道路距离/m 0~800 正西 软岩 800~1600 坡度/(°) 0~15 软硬交替 1600~2400 15~30 距长江距离/m 0~1000 2400~3200 30~45 1000~2000 3200~4000 45~60 2000~3000 >4000 60~75 3000~4000 >75 4000-5000 坡长/m 0~800 >5000 800~1600 地形湿度指数 <6 1600~2400 6~9 2400~3200 9~12 >3200 12~15 地形表面纹理 0~0.14 15~18 0.14~0.28 年平均降雨量/mm <990 0.28~0.42 990~1020 0.42~0.56 1020~1050 >0.56 1050~1080 1080~1110 >1110 表 3 CNN模型参数设置表
Table 3. Configuration of parameters for the CNN model
CNN-2D各参数项 参数值 CNN-2D各参数项 参数值 卷积核大小 3 × 3 优化器 Adam 最大池化核 2 × 2 迭代次数 20 激活函数 ReLu 批量数据大小 2000 误差函数 交叉熵误差 学习率 0.001 表 4 特定类别精度分析
Table 4. Analysis of specific category accuracy
模型 RUS-CNN SMOTE-
Tomek-CNNRUS-SVM SMOTE-
Tomek-SVM极低易发 1.28 0.60 0.76 0.46 低易发 15.71 16.40 9.31 9.12 中易发 27.24 29.15 23.91 24.33 高易发 41.09 45.26 38.57 44.18 极高易发 64.14 73.60 56.73 61.17 表 5 曲线下面积分析
Table 5. Area under curve analysis
检验结果变量 面积 标准差① 渐进Sig.② 渐进95%置信区间 下限 上限 RUS-CNN 0.929 0.001 0.000 0.928 0.930 SMOTE-Tomek-CNN 0.965 0.000 0.000 0.964 0.965 RUS-SVM 0.942 0.000 0.000 0.941 0.943 SMOTE-Tomek-SVM 0.951 0.000 0.000 0.950 0.952 注:①在非参数假设下;②零假设:实面积=0.5。 -
[1] XI Chuanjie,HAN Mei,HU Xiewen,et al. Effectiveness of newmark-based sampling strategy for coseismic landslide susceptibility mapping using deep learning,support vector machine,and logistic regression[J]. Bulletin of Engineering Geology and the Environment,2022,81(5):174. doi: 10.1007/s10064-022-02664-5
[2] 于宪煜,胡友健,牛瑞卿. 基于RS-SVM模型的滑坡易发性评价因子选择方法研究[J]. 地理与地理信息科学,2016,32(3):23 − 28. [YU Xianyu,HU Youjian,NIU Ruiqing. Research on the method to select landslide susceptibility evaluation factors based on RS-SVM model[J]. Geography and Geo-Information Science,2016,32(3):23 − 28. (in Chinese with English abstract)]
YU Xianyu, HU Youjian, NIU Ruiqing. Research on the method to select landslide susceptibility evaluation factors based on RS-SVM model[J]. Geography and Geo-Information Science, 2016, 32(3): 23 − 28. (in Chinese with English abstract)
[3] 贾雨霏, 魏文豪, 陈稳, 等. 基于SOM-I-SVM耦合模型的滑坡易发性评价[J]. 水文地质工程地质,2023,50(3):125 − 137. [JIA Yufei, WEI Wenhao, CHEN Wen, et al. Landslide susceptibility assessment based on the SOM-I-SVM model[J]. Hydrogeology & Engineering Geology,2023,50(3):125 − 137. (in Chinese with English abstract)]
JIA Yufei, WEI Wenhao, CHEN Wen, et al. Landslide susceptibility assessment based on the SOM-I-SVM model[J]. Hydrogeology & Engineering Geology, 2023, 50(3): 125 − 137. (in Chinese with English abstract)
[4] 王雪冬, 张超彪, 王翠, 等. 基于Logistic回归与随机森林的和龙市地质灾害易发性评价[J]. 吉林大学学报(地球科学版),2022,52(6):1957 − 1970. [WANG Xuedong, ZHANG Chaobiao, WANG Cui, et al. Geological disaster susceptibility in Helong City based on logistic regression and random forest[J]. Journal of Jilin University (Earth Science Edition),2022,52(6):1957 − 1970. (in Chinese with English abstract)]
WANG Xuedong, ZHANG Chaobiao, WANG Cui, et al. Geological disaster susceptibility in Helong City based on logistic regression and random forest[J]. Journal of Jilin University (Earth Science Edition), 2022, 52(6): 1957 − 1970. (in Chinese with English abstract)
[5] 杨得虎, 朱杰勇, 刘帅, 等. 基于信息量、加权信息量与逻辑回归耦合模型的云南罗平县崩滑灾害易发性评价对比分析[J]. 中国地质灾害与防治学报,2023,34(5):43 − 53. [YANG Dehu, ZHU Jieyong, LIU Shuai, et al. Comparative analyses of susceptibility assessment for landslide disasters based on information value, weighted information value and logistic regression coupled model in Luoping County, Yunnan Province[J]. The Chinese Journal of Geological Hazard and Control,2023,34(5):43 − 53. (in Chinese with English abstract)]
YANG Dehu, ZHU Jieyong, LIU Shuai, et al. Comparative analyses of susceptibility assessment for landslide disasters based on information value, weighted information value and logistic regression coupled model in Luoping County, Yunnan Province[J]. The Chinese Journal of Geological Hazard and Control, 2023, 34(5): 43 − 53. (in Chinese with English abstract)
[6] 杜国梁, 杨志华, 袁颖, 等. 基于逻辑回归–信息量的川藏交通廊道滑坡易发性评价[J]. 水文地质工程地质,2021,48(5):102 − 111. [DU Guoliang, YANG Zhihua, YUAN Ying, et al. Landslide susceptibility mapping in the Sichuan-Tibet traffic corridor using logistic regression-information value method[J]. Hydrogeology & Engineering Geology,2021,48(5):102 − 111. (in Chinese with English abstract)]
DU Guoliang, YANG Zhihua, YUAN Ying, et al. Landslide susceptibility mapping in the Sichuan-Tibet traffic corridor using logistic regression-information value method[J]. Hydrogeology & Engineering Geology, 2021, 48(5): 102 − 111. (in Chinese with English abstract)
[7] 林荣福,刘纪平,徐胜华,等. 随机森林赋权信息量的滑坡易发性评价方法[J]. 测绘科学,2020,45(12):131 − 138. [LIN Rongfu,LIU Jiping,XU Shenghua,et al. Evaluation method of landslide susceptibility based on random forest weighted information[J]. Science of Surveying and Mapping,2020,45(12):131 − 138. (in Chinese with English abstract)]
LIN Rongfu, LIU Jiping, XU Shenghua, et al. Evaluation method of landslide susceptibility based on random forest weighted information[J]. Science of Surveying and Mapping, 2020, 45(12): 131 − 138. (in Chinese with English abstract)
[8] 张明岳,李丽敏,温宗周. RNN与LSTM方法用于滑坡位移动态预测的研究[J]. 人民珠江,2021,42(9):6 − 13. [ZHANG Mingyue,LI Limin,WEN Zongzhou. Research on RNN and LSTM method for dynamic prediction of landslide displacement[J]. Pearl River,2021,42(9):6 − 13(in Chinese with English abstract)]
ZHANG Mingyue, LI Limin, WEN Zongzhou. Research on RNN and LSTM method for dynamic prediction of landslide displacement[J]. Pearl River, 2021, 42(9): 6 − 13(in Chinese with English abstract)
[9] 顾华奇,陈皆红,李婷. 基于深度学习的滑坡监测与早期预警方法研究[J]. 江西科学,2019,37(2):209 − 213. [GU Huaqi,CHEN Jiehong,LI Ting. Research on landslide monitoring and early warning based on depth learning[J]. Jiangxi Science,2019,37(2):209 − 213. (in Chinese with English abstract)]
GU Huaqi, CHEN Jiehong, LI Ting. Research on landslide monitoring and early warning based on depth learning[J]. Jiangxi Science, 2019, 37(2): 209 − 213. (in Chinese with English abstract)
[10] Huijuan,ZHANG. Combining a class-weighted algorithm and machine learning models in landslide susceptibility mapping:A case study of Wanzhou section of the Three Gorges Reservoir,China[J]. Computers & Geosciences,2022,158:104966.
[11] WANG Yi. Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County,China[J]. Science of the Total Environment,2019,666:975 − 993. doi: 10.1016/j.scitotenv.2019.02.263
[12] XU Chong. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed,China[J]. Geomorphology,2012,145/146:70 − 80. doi: 10.1016/j.geomorph.2011.12.040
[13] CHAWLA N V,BOWYER K W,HALL L O,et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research,2002,16:321 − 357. doi: 10.1613/jair.953
[14] 武雪玲,杨经宇,牛瑞卿. 一种结合SMOTE和卷积神经网络的滑坡易发性评价方法[J]. 武汉大学学报(信息科学版),2020,45(8):1223 − 1232. [WU Xueling,YANG Jingyu,NIU Ruiqing. A landslide susceptibility assessment method using SMOTE and convolutional neural network[J]. Geomatics and Information Science of Wuhan University,2020,45(8):1223 − 1232. (in Chinese with English abstract)]
WU Xueling, YANG Jingyu, NIU Ruiqing. A landslide susceptibility assessment method using SMOTE and convolutional neural network[J]. Geomatics and Information Science of Wuhan University, 2020, 45(8): 1223 − 1232. (in Chinese with English abstract)
[15] 王忠洋. 基于SMOTE-Tomek和卷积神经网络的入侵检测模型研究[D]. 南昌:江西师范大学,2020. [WANG Zhongyang. Research on intrusion detection model based on SMOTE-tomek and convolutional neural network[D]. Nanchang:Jiangxi Normal University,2020. (in Chinese with English abstract)]
WANG Zhongyang. Research on intrusion detection model based on SMOTE-tomek and convolutional neural network[D]. Nanchang: Jiangxi Normal University, 2020. (in Chinese with English abstract)
[16] 于宪煜. 基于多源数据和多尺度分析的滑坡易发性评价方法研究[D]. 武汉:中国地质大学,2016. [YU Xianyu. Study on landslide susceptibility evaluation method based on multi-source data and multi-scale analysis[D]. Wuhan:China University of Geosciences,2016. (in Chinese with English abstract)]
YU Xianyu. Study on landslide susceptibility evaluation method based on multi-source data and multi-scale analysis[D]. Wuhan: China University of Geosciences, 2016. (in Chinese with English abstract)
[17] YU Xianyu,ZHANG Kaixiang,SONG Yingxu,et al. Study on landslide susceptibility mapping based on rock–soil characteristic factors[J]. Scientific Reports,2021,11:15476. doi: 10.1038/s41598-021-94936-5
[18] LI Wenjuan,FANG Zhice,WANG Yi. Stacking ensemble of deep learning methods for landslide susceptibility mapping in the Three Gorges Reservoir area,China[J]. Stochastic Environmental Research and Risk Assessment,2022,36(8):2207 − 2228. doi: 10.1007/s00477-021-02032-x
[19] FANG Zhice,WANG Yi,PENG Ling,et al. A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping[J]. International Journal of Geographical Information Science,2021,35(2):321 − 347. doi: 10.1080/13658816.2020.1808897
[20] DAHL G E,SAINATH T N,HINTON G E. Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing. May 26-31,2013,Vancouver,BC,Canada. IEEE,2013:8609 − 8613.
[21] KINGMA D P,BA J. Adam:A method for stochastic optimization[EB/OL]. 2014:arXiv:1412.6980.
[22] YU Xianyu,GAO Huachen. A landslide susceptibility map based on spatial scale segmentation:A case study at Zigui-Badong in the Three Gorges Reservoir Area,China[J]. PLoS One,2020,15(3):e0229818. doi: 10.1371/journal.pone.0229818
[23] SÜZEN M L,DOYURAN V. A comparison of the GIS based landslide susceptibility assessment methods:Multivariate versus bivariate[J]. Environmental Geology,2004,45(5):665 − 679. doi: 10.1007/s00254-003-0917-8