European Conference of Machine Learning - Principles and Practice of Knowledge Discovery in Databases conducted a Machine Learning competition where the task was to classify the land cover.
Note: Unfortunately, I was unable to submit my prediction data points on time. But I got 96.34 % accuracy which would have been 4th position in the competition. Anyways I described my approach below.
Screenshot which shows the discussion over email:
The classification of land cover was divided into the following multi-class(9 classes) distribution.
Below picture depicts the class distribution.
There were 230 columns which contained -ve & +ve data points representing the land cover.
Outliers Removal:All the columns had few rows with outliers which were removed. Boxplot depicting outliers in variable col1:
Feature Engineering:I extracted several features out of which I ended up using the following features after feature selection. - coord1_col_1_std - Standard deviation of col1 grouped by coord1. - coord_diff_1 - coord1 minus coord2 variables. - coord_diff_2 - coord2 minus coord1 variables. - coords_combined - coord1 + coord2 variables. Overall, I ended up using 13 features after feature selection.
BoxCox Transformation for Skewed Variables :
Most of the variables were highly skewed.
I applied box-cox transformation on variables with (+-)ve 0.25 skew.
Standardize data:I applied Standard Scaling transformation to standardize the data.
Things that I tried which didn't improve Validation score:- Polynomial features/ Feature interactions. - Mean, standard deviations, medians(Measures of central tendency) grouped by coordinates. - Robust Scaling before removing the outliers. - Stacking multiple models. - Max voting based on multiple models. - Dimensionality reduction.
I tried several models which resulted in the following local validation scores:
|Passive Aggressive Classifier:||0.47|
|Linear Discriminant Analysis:||0.67|
|Decision Tree Classifier:||0.89|
|Gradient Boosting Classifier:||0.89|
|Random Forest Classifier:||0.93|
|Extra Trees Classifier:||0.95|
Code available at github