滑坡易发性评价中样本不均衡问题处理研究

田尤; 高波; 殷红; 李元灵; 张佳佳; 陈龙; 李洪梁

doi:10.16030/j.cnki.issn.1000-3665.202307002

滑坡易发性评价中样本不均衡问题处理研究

Handling imbalanced samples in landslide susceptibility evaluation

摘要

摘要: 滑坡易发性评价中，样本不均衡问题的不同处理方案通常会带来评价结果的大量不确定性。针对这一问题，以藏东昌都市部分县（区）为研究区，构建滑坡/非滑坡样本不均衡数据集，采用不处理、下采样和合成少数类过采样（synthetic minority oversampling technique, SMOTE）3种处置方案，运用逻辑回归方法分别构建滑坡易发性评价模型。基于ROC曲线、准确度、精确率、召回率、漏检率等评价指标，采用综合评价指标F₁′同数对模型分类的精度进行验证。结果表明：数据处理成均衡数据集（过采样/下采样）建立的模型效果较不处理数据建立的模型效果有了大幅提升，F₁′同数的值最大提高了53.17%；在下采样、过采样两种数据处理方案中，过采样方法比下采样方法F₁′分数的值提高了16.30%，表明过采样方法对处理样本不均衡数据问题方面具有较好效果。研究成果可为滑坡预测和地质灾害预测前的数据集处理提供参考，为进一步提高区域防灾减灾水平提供理论与技术支持。

Abstract: In landslide susceptibility assessment, different approaches to handling sample imbalance can introduce significant uncertainty in evaluation outcomes. To address this issue, this study focused on the Changdu area of eastern Tibet and constructed the landslide susceptibility evaluation model using a dataset with imbalanced landslide and non-landslide samples. Three disposal schemes were applied: no treatment, downsampling, and SMOTE oversampling. The logistic regression method was used to construct the landslide susceptibility evaluation model. Based on ROC curve, accuracy, precision, recall, missed detection rate, and other evaluation indicators, the comprehensive evaluation index of F₁′ score was used to verify the accuracy of model classification. The results show that the modeling effect of landslide susceptibility obtained by data processing into equilibrium data (downsampling/oversampling) is greatly improved compared with that obtained without processing data. Specifically, the value of the F₁′score of the comprehensive index was increased by 53.17%. In the two schemes for processing data (downsampling and oversampling), the oversampling method increased the value of the composite index F₁′ score by 16.30% compared with the downsampling method, indicating that the oversampling method has effectiveness in handling unbalanced data. This study can provide basic information for processing of data sets before landslide prediction and geological disaster prediction, and provide theoretical and technical support for further improving regional disaster prevention and mitigation.

HTML全文

参考文献(34)

施引文献

资源附件(0)