ISSN 1000-3665 CN 11-2202/P

    基于递归特征消除与贝叶斯优化的盐碱地信息遥感提取模型研究

    Research on remote sensing extraction model of saline-alkali land information based on recursive feature elimination and Bayesian optimization

    • 摘要: 机器学习模型在盐碱地信息提取过程中,面临着模型特征选择的复杂性和超参数调优的困难,导致模型在实际应用中存在分类结果精度不高的问题。为精确提取吉林西部盐碱地信息,结合RS和GIS技术,基于Sentinel-1和Sentinel-2数据提取了光谱特征、土壤指数、盐分指数和雷达特征等4种特征,结合递归特征消除(recursive feature elimination,RFE)与随机森林(random forest,RF)方法进行特征优选和特征重要性排序,然后通过贝叶斯优化对RF、支持向量机(support vector machine,SVM)、K近邻(k-nearest neighbor,KNN)3种模型的超参数进行优化并对比分析分类结果。结果表明:在保持分类精度的前提下,经过特征优选剔除了11个特征,极大地减少了冗余信息;特征重要性表明显著影响模型性能的特征依次为蓝色波段(B2)、绿色波段(B3)和短波红外波段(B12);经过贝叶斯优化后,相比SVM和KNN,RF的分类精度最高,总体精度、Kappa系数、用户精度和召回率分别为0.884,0.878,0.907,0.889,能够更好地消除或减轻噪声对分类结果的影响,同时具有较好的分类性能和稳定性。通过特征优选与贝叶斯优化后的随机森林模型,能够高效准确地提取吉林西部盐碱地信息。研究结果可为吉林西部地区的农业可持续发展、盐碱地改良及生态环境保护提供科学依据和决策参考。

       

      Abstract: In the process of extracting saline-alkali land information, machine learning models face the complexity of model feature selection and the difficulty of hyperparameter tuning, which leads to the problem of low classification accuracy in practical applications. To accurately extract information on saline-alkali land in western Jilin and provide a scientific basis for agricultural production and environmental governance. In this study, combined with remote sensing and GIS technology, spectral features, soil index, salt index and radar features were extracted based on Sentinel-1 and Sentinel-2 data. Recursive feature elimination (RFE) and random forest (RF) algorithms were used for feature optimization and feature importance ranking. Then, Bayesian optimization was used to optimize the hyperparameters of RF, support vector machine (SVM) and k-nearest Neighbor (KNN) models, and the classification results were compared and analysed. The results showed that under the premise of maintaining the classification accuracy, 11 features were eliminated by feature selection, which greatly reduced the redundant information. The importance of features indicated that the features that significantly affected the performance of the model were the blue band (B2), green band (B3) and short-wave infrared band (B12). After Bayesian optimization, compared with SVM and KNN, RF has the highest classification accuracy. The overall precision, Kappa coefficient, user precision and recall were 0.884, 0.878, 0.907 and 0.889 respectively, which can better eliminate or reduce the influence of noise on the classification results, and has better classification performance and stability. The RF model after feature selection and Bayesian optimization can accurately extract the saline-alkali land information in western Jilin. The research results can provide a scientific basis and decision-making reference for the sustainable development of agriculture, improvement of saline-alkali land and ecological environment protection in western Jilin.

       

    /

    返回文章
    返回