A comparison of tree-based ensemble algorithms on the main element content of monoclinal pyroxene in mafic-ultramafic rocks
-
摘要:
依靠岩浆构造环境的地球化学成分认识岩浆形成过程是岩石地球化学中的重要应用。当前利用岩石地球化学成分判别构造环境的工作还不够深入。用4种基于决策树的机器学习方法对来自全球新生代洋岛玄武岩(OIB)、岛弧玄武岩(IAB)及大洋中脊玄武岩(MORB)等镁铁-超镁铁岩中单斜辉石的13种主量元素构成数据集进行了岩浆构造环境判别和主要特征排序。通过对比4种基于决策树的机器学习方法,验证了树类算法对于地球化学成分识别问题的有效性,并总结出4种方法在处理岩浆构造环境判别问题时的优劣:决策树算法判别过程更易于理解,但是其准确率欠佳;boosting算法中的AdaBoost和GBDT对于岩浆构造环境的鉴别准确度较高,但构造过程复杂;bagging集成算法随机森林在权衡性能和模型可理解性时不失为一个良好的选择。此外,还通过4种算法的特征重要性排序得出Cr2O3,TFeO,TiO2,FeO和Al2O3是进行岩浆构造环境判别的重要成分。
Abstract:Relying on the geochemical composition of the magma tectonic environment to understand the formation process of magma is an important application in rock geochemistry. While the current works to make full use of rock geochemical components for the tectonic setting discrimination are not enough. In this study, the authors utilized four tree-based machine learning methods to make magma tectonic environment discriminations and feature sorting on the 13 main ingredients of monoclinal pyroxene in maficultramafic rocks from global Cenozoic ocean island (OIB), island arc (IAB), and mid-ocean ridge (MORB). Through the comparison of the four tree-based machine learning methods, the authors proved the validity of the tree-based methods for the identification of geochemical components and derived the advantages and disadvantages of the four methods in dealing with the identification of rock tectonic environments:decision trees gain better comprehensibility but have lower recognition accuracy, boosting algorithms AdaBoost and GBDT have the best recognition accuracy but lower comprehensibility, and random forest is a better choice during trading off and comprehensibility performance. Besides, Cr2O3, TFeO, TiO2, FeO and Al2O3 are figured out as the most important ingredients for magma tectonic environment discriminations on this dataset.
-
图 1 文献中常用经典若干玄武岩构造环境判别图[15]
Figure 1.
图 6 基于主量元素的混淆矩阵(代号同表 1)
Figure 6.
图 8 基于主要成分散点图(代号同表 1)
Figure 8.
表 1 镁铁-超镁铁岩中单斜辉石主量元素统计信息
Table 1. Major element content of clinopyroxene in mafic-ultramafic rocks in the dataset
主量兀素 IAB(岛弧玄武岩) OIB(洋岛玄武岩) MORB (大洋屮脊玄武岩) 数据量 平均数/% 屮位数/% 数据量 平均数/% 屮位数/% 数据量 平均数/% 屮位数/% SiO2 329 52.09 52.30 198 48.95 49.92 795 51.87 51.73 TiO2 324 0.36 0.20 198 1.72 1.24 784 0.16 0.10 Al2O3 329 3.63 3.62 198 4.45 3.67 795 4.34 4.55 Cr2O3 296 0.68 0.69 135 0.46 0.42 790 1.20 1.23 Fe2O3 52 1.27 0.99 1 3.34 3.34 10 1.46 1.65 TFeO 254 3.60 2.81 184 6.13 6.78 225 2.55 2.49 FeO 75 3.28 3.11 14 5.99 5.85 570 2.80 2.69 CaO 329 22.49 22.56 198 22.22 22.33 795 22.16 22.32 MgO 329 16.37 16.52 198 14.38 15.34 795 17.07 17.18 MnO 307 0.10 0.10 192 0.12 0.13 789 0.09 0.09 NiO 171 0.05 0.03 12 0.03 0.01 601 0.05 0.05 K2O 213 0.01 0.00 70 0.01 0.00 180 0.01 0.01 Na2O 320 0.43 0.39 198 0.55 0.34 778 0.31 0.18 表 2 镁铁-超镁铁岩中单斜辉石主量元素参数设置
Table 2. Parameter settings of clinopyroxene in maficultramafic Rocks
参数 决策树 随机森林 AdaBoost GBDT max features 0.55 0.36 0.08 0.03 max depth 21 6 39 27 min samples split 2 5 3 4 min samples leaf 4 3 2 2 n estimators - 90 210 710 learning rate - - 0.123 0.008 subsample - - - 0.52 表 3 基于主量元素的性能指标
Table 3. Performance indexes on major element data
测量指标 决策树 随机森林 AdaBoost GBDT MaP 0.8393(+/-0.0357) 0.9120(+/-0.0268) 0.9219(+/-0.0179) 0.9224(+/-0.0292) MaR 0.8416(+/-0.0347) 0.8904(+/-0.0398) 0.9023(+/-0.0345) 0.9057(+/-0.0302) MaF 0.8389(+/-0.0294) 0.8997(+/-0.0305) 0.9108(+/-0.0241) 0.9130(+/-0.0243) Accurcy 0.8715(+/-0.0199) 0.9212(+/-0.0280) 0.9302(+/-0.0193) 0.9315(+/-0.0227) -
[1] Leterrier J, Maury R C, Thonon P, et al. Clinopyroxene composition as a method of identification of the magmatic affinities of paleo-volcanic series[J]. Earth and Planetary Science Letters, 1982, 59(1):139-154. doi: 10.1016/0012-821X(82)90122-4
[2] Asthana D. Relict clinopyroxenes from within-plate metadolerites of the Petroi Metabasalt, the New England Fold Belt, Australia[J]. Mineralogical Magazine, 1991, 55(381):549-561. doi: 10.1180/minmag.1991.055.381.08
[3] Nisbet E G, Pearce J A. Clinopyroxene composition in mafic lavas from different tectonic settings[J]. Contributions to Mineralogy and Petrology, 1977, 63(2):149-160. doi: 10.1007/BF00398776
[4] Helmy H M, El Mahallawi M M. Gabbro akarem mafic-ultramafic complex, Eastern Desert, Egypt:A Late Precambrian analogue of Alaskan-type complexes[J]. Mineralogy and Petrology, 2003, 77(1):85-108. http://cn.bing.com/academic/profile?id=632e30f2ba51775b814eef9eeac032bf&encoded=0&v=paper_preview&mkt=zh-cn
[5] Khedr M Z, Arai S. Chemical variations of mineral inclusions in Neoproterozoic high-Cr chromitites from Egypt:Evidence of fluids during chromitite genesis[J]. Lithos, 2016, 240:309-326. http://cn.bing.com/academic/profile?id=cc6430c832b24f5fc56ad8e3e50d1cc4&encoded=0&v=paper_preview&mkt=zh-cn
[6] Hanson R E, Roberts J M, Dickerson P W, et al. Cryogenian intraplate magmatism along the buried southern Laurentian margin:Evidence from volcanic clasts in Ordovician strata, Marathon uplift, west Texas[J]. Geology, 2016, 44(7):539-542. doi: 10.1130/G37889.1
[7] Menand T, Annen C, Blanquat M de S. Rates of magma transfer in the crust:Insights into magma reservoir recharge and pluton growth[J]. Geology, 2015, 43(3):199-202. http://cn.bing.com/academic/profile?id=6a98a13366f56b820cbddf206a768c8d&encoded=0&v=paper_preview&mkt=zh-cn
[8] Pearce J A, Cann J R. Tectonic setting of basic volcanic rocks determined using trace element analyses[J]. Earth and Planetary Science Letters, 1973, 19(2):290-300. doi: 10.1016/0012-821X(73)90129-5
[9] Glassley W. Geochemistry and tectonics of the Crescent volcanic rocks, Olympic Peninsula, Washington[J]. GSA Bulletin, 1974, 85(5):785-794. doi: 10.1130/0016-7606(1974)85<785:GATOTC>2.0.CO;2
[10] Pearce J A, Lippard S J, Roberts S. Characteristics and tectonic significance of supra-subduction zone ophiolites[J]. Geological Society, London, Special Publications, 1984, 16(1):77-94. doi: 10.1144/GSL.SP.1984.016.01.06
[11] Pearce J A. Trace element characteristics of lavas from destructive plate boundaries[J]. Andesites, 1982, 8:528-548. http://cn.bing.com/academic/profile?id=07172bff16d4303c212691c016029b47&encoded=0&v=paper_preview&mkt=zh-cn
[12] Shervais J W. Ti-V plots and the petrogenesis of modern and ophiolitic lavas[J]. Earth and Planetary Science Letters, 1982, 59(1):101-118. doi: 10.1016/0012-821X(82)90120-0
[13] Wood D A. The application of a Th-Hf-Ta diagram to problems of tectonomagmatic classification and to establishing the nature of crustal contamination of basaltic lavas of the British Tertiary volcanic Province[J]. Earth and Planetary Science Letters, 1980, 50(1):11-30. doi: 10.1016/0012-821X(80)90116-8
[14] Mullen E D. MnO/TiO2/P2O5:a minor element discriminant for basaltic rocks of oceanic environments and its implications for petrogenesis[J]. Earth and Planetary Science Letters, 1983, 62(1):53-62. doi: 10.1016/0012-821X(83)90070-5
[15] Zhang Q, Sun W, Zhao Y, et al. New discrimination diagrams for basalts based on big data research[J]. Big Earth Data, 2019, 3(1):45-55. doi: 10.1080/20964471.2019.1576262
[16] 王金荣, 陈万峰, 张旗, 等. N-MORB和E-MORB数据挖掘——玄武岩判别图及洋中脊源区地幔性质的讨论[J].岩石学报, 2017, 33(3):993-1005. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=ysxb98201703023
[17] 王金荣, 潘振杰, 张旗, 等.大陆板内玄武岩数据挖掘:成分多样性及在判别图中的表现[J].岩石学报, 2016, 32(7):1919-1933. http://d.old.wanfangdata.com.cn/Periodical/ysxb98201607001
[18] 杨婧, 王金荣, 张旗, 等.全球岛弧玄武岩数据挖掘——在玄武岩判别图上的表现及初步解释[J].地质通报, 2016, 35(12):1937-1949. doi: 10.3969/j.issn.1671-2552.2016.12.001 http://dzhtb.cgs.cn/gbc/ch/reader/view_abstract.aspx?file_no=20161201&flag=1
[19] 汪云亮, 张成江.玄武岩类形成的大地构造环境的Th/HfTa/Hf图解判别[J].岩石学报, 2001, 17(3):413-421.
[20] 李玉琼, 杜雪亮, 金维浚, 等.大洋中脊、洋岛、岛弧玄武岩中橄榄石的对比研究[J].地质科学, 2018, 53(4):1228-1239. http://d.old.wanfangdata.com.cn/Periodical/dzkx201804005
[21] 韩帅, 李明超, 任秋兵, 等.基于大数据方法的玄武岩大地构造环境智能挖掘判别与分析[J].岩石学报, 2018, 34(11):3207-3216. http://d.old.wanfangdata.com.cn/Periodical/ysxb98201811006
[22] 焦守涛, 周永章, 张旗, 等.基于GEOROC数据库的全球辉长岩大数据的大地构造环境智能判别研究[J].岩石学报, 2018, 34(11):3189-3194. http://d.old.wanfangdata.com.cn/Periodical/ysxb98201811004
[23] Vermeesch P. Tectonic discrimination of basalts with classification trees[J]. Geochimica et Cosmochimica Acta, 2006, 70(7):1839-1848. doi: 10.1016/j.gca.2005.12.016
[24] 朱林奇, 张冲.谱聚类-Adaboost集成数据挖掘算法在岩性识别中的应用[J].中国科技论文, 2016, 11(5):545-550. doi: 10.3969/j.issn.2095-2783.2016.05.014
[25] 韩启迪, 张小桐, 申维.基于梯度提升决策树(GBDT)算法的岩性识别技术[J].矿物岩石地球化学通报, 2018, 37(6):1173-1180. http://d.old.wanfangdata.com.cn/Periodical/kwysdqhxtb201806016
[26] Lehnert K, Su Y, Langmuir C H, et al. A global geochemical database structure for rocks[EB/OL] [2019-04-10] https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/1999GC000026 Geochemistry, Geophysics, Geosystems, 2000.
[27] Phan A V, Nguyen M L, Bui L T. Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems[J]. Applied Intelligence, 2017, 46(2):455-469. doi: 10.1007/s10489-016-0843-6
[28] Luo X, Xu Y, Wang W, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4):1945-1966. doi: 10.1016/j.jfranklin.2017.08.014
[29] Luo X, Sun J, Wang L, et al. Short-term wind speed forecasting via stacked extreme learning machine with generalized correntropy[J]. IEEE Transactions on Industrial Informatics, 2018, 14(11):4963-4971. doi: 10.1109/TII.2018.2854549
[30] Gorissen D, Couckuyt I, Demeester P, et al. A surrogate modeling and adaptive sampling toolbox for computer based design[J]. Journal of Machine Learning Research, 2010, 11(Jul):2051-2055. http://cn.bing.com/academic/profile?id=f45f6cb72f866c5c22f2e6ab92c9a635&encoded=0&v=paper_preview&mkt=zh-cn
[31] Yan R, Ma Z, Zhao Y, et al. A decision tree based data-driven diagnostic strategy for air handling units[J]. Energy and Buildings, 2016, 133:37-45. doi: 10.1016/j.enbuild.2016.09.039
[32] 卢东标.基于决策树的数据挖掘算法研究与应用[D].武汉理工大学硕士学位论文, 2008.
[33] Mantovani R G, Horváth T, Cerri R, et al. Hyper-parameter tuning of a decision tree induction algorithm[C]//2016 5th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2016: 37-42.
[34] Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, et al. Machine learning predictive models for mineral prospectivity:An evaluation of neural networks, random forest, regression trees and support vector machines[J]. Ore Geology Reviews, 2015, 71:804-818. doi: 10.1016/j.oregeorev.2015.01.001
[35] Hastie T, Rosset S, Zhu J, et al. Multi-class AdaBoost[J]. Statistics and Its Interface, 2009, 2(3):349-360. doi: 10.4310/SII.2009.v2.n3.a8
[36] 杜雪亮, 李玉琼, 金维浚, 等.镁铁质-超镁铁质岩浆岩中单斜辉石的智能分析研究[J].地质科学, 2018, 53(4):1215-1227. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=dzkx201804004
[37] Snoek J, Larochelle H, Adams R P. Practical bayesian optimization of machine learning algorithms[C]//Advances in neural information processing systems. 2012: 2951-2959.
[38] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn:Machine learning in Python[J]. Journal of Machine Learning research, 2011, 12(Oct):2825-2830. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1309.0238
[39] Altmann A, Toloşi L, Sander O, et al. Permutation importance:a corrected feature importance measure[J]. Bioinformatics, 2010, 26(10):1340-1347. doi: 10.1093/bioinformatics/btq134