Feature Importance in Crop Classification Machine Learning Model

Mihkel Järveoja
Remotely sensed, in particular satellite data, is already widely used in agricultural parcels monitoring, and this trend is not showing signs of diminishing. Wide range of machine learning algorithms have significantly reduced the burden to interpret bulky and often complex satellite data, contributing to the exploration of new use-cases and services. In this study Random Forest classification model is used to separate 28 crop type classes in Estonia. Input data consisted of two seasons (2018, 2019) of Estonian agricultural parcels and features calculated from Sentinel-1 and Sentinel-2 satellite images, meteorological records and soil maps. Achieved multiclass weighted F1 score for year 2018 test set was 0.82 and for year 2019 0.85. Among most important features were Sentinel-1 VH and VV polarization back-scatter intensities and Sentinel-2 PSRI, NDVI and TC-vegetation indices. It was discovered that Sentinel-2 features were more prominent in early (May) and late season (August), but during mid-season (June, July) their importance decreased significantly. Sentinel-1 back-scatter features were more important during mid-season. It was concluded, that using both radar and optical satellite data ensure better classification result than using any of them separately, since they complement each other.
Graduation Thesis language
Graduation Thesis type
Master - Conversion Master in IT
Kaupo Voormansik, Tambet Matiisen
Defence year
PDF Extras