ISSN 1000-3665 CN 11-2202/P
    田尤,高波,殷红,等. 滑坡易发性评价中样本不均衡问题处理研究[J]. 水文地质工程地质,2024,51(0): 1-11. DOI: 10.16030/j.cnki.issn.1000-3665.202307002
    引用本文: 田尤,高波,殷红,等. 滑坡易发性评价中样本不均衡问题处理研究[J]. 水文地质工程地质,2024,51(0): 1-11. DOI: 10.16030/j.cnki.issn.1000-3665.202307002
    TIAN You, GAO Bo, YIN Hong, et al. Handling imbalanced samples in landslide susceptibility evaluation[J]. Hydrogeology & Engineering Geology, 2024, 51(0): 1-11. DOI: 10.16030/j.cnki.issn.1000-3665.202307002
    Citation: TIAN You, GAO Bo, YIN Hong, et al. Handling imbalanced samples in landslide susceptibility evaluation[J]. Hydrogeology & Engineering Geology, 2024, 51(0): 1-11. DOI: 10.16030/j.cnki.issn.1000-3665.202307002

    滑坡易发性评价中样本不均衡问题处理研究

    Handling imbalanced samples in landslide susceptibility evaluation

    • 摘要: 滑坡易发性评价中,样本不均衡问题的不同处理方案通常会带来评价结果的大量不确定性。针对这一问题,以藏东昌都地区为研究区,通过构建滑坡/非滑坡样本不均衡数据集,采用不处理、下采样和合成少数类过采样(synthetic minority oversampling technique, SMOTE)3种处置方案,运用逻辑回归方法分别构建滑坡易发性评价模型。基于ROC曲线、准确度、精确率、召回率、漏检率等评价指标,采用F1分数综合评价指标对模型分类的精度进行验证。结果表明:数据处理成均衡数据集(过采样/下采样)建立的模型效果较不处理数据建立的模型效果有了大幅提升,综合指标F1′分数的值最大提高了53.17%;在两种处理下采样、过采样数据的方案中,过采样方法比下采样方法综合指标F1′分数的值提高了16.30%,表明过采样方法对处理样本不均衡数据问题方面具有较好效果。研究成果可为滑坡预测和地质灾害预测前的数据集处理提供参考,为进一步提高区域防灾减灾水平提供理论与技术支持。

       

      Abstract: In landslide susceptibility assessment, different approaches to handling sample imbalance can introduce significant uncertainty in evaluation outcomes. To address this issue, this study focused on the Changdu area of eastern Tibet and constructed the landslide susceptibility evaluation model using a dataset with imbalanced landslide and non-landslide samples. Three disposal schemes were applied: no treatment, downsampling, and SMOTE oversampling. The logistic regression method was used to construct the landslide susceptibility evaluation model. Based on ROC curve, accuracy, precision, recall, missed detection rate, and other evaluation indicators, the comprehensive evaluation index of F1′ score was used to verify the accuracy of model classification. The results show that the modeling effect of landslide susceptibility obtained by data processing into equilibrium data (downsampling/oversampling) is greatly improved compared with that obtained without processing data. Specifically, the value of the F1′score of the comprehensive index was increased by 53.17%. In the two schemes for processing data (downsampling and oversampling), the oversampling method increased the value of the composite index F1′ score by 16.30% compared with the downsampling method, indicating that the oversampling method has effectiveness in handling unbalanced data. This study can provide basic information for processing of data sets before landslide prediction and geological disaster prediction, and provide theoretical and technical support for further improving regional disaster prevention and mitigation.

       

    /

    返回文章
    返回