European Conference of Machine Learning - Principles and Practice of Knowledge Discovery in Databases conducted a Machine Learning competition where the task was to classify the land cover.
Note: Unfortunately, I was unable to submit my prediction data points on time. But I got 96.34 % accuracy which would have been 4th position in the competition. Anyways I described my approach below.
Screenshot which shows the discussion over email:
The classification of land cover was divided into the following multi-class(9 classes) distribution.
Below picture depicts the class distribution.
There were 230 columns which contained -ve & +ve data points representing the land cover.
All the columns had few rows with outliers which were removed.
Boxplot depicting outliers in variable col1:
I extracted several features out of which I ended up using the following features after feature selection.
- coord1_col_1_std - Standard deviation of col1 grouped by coord1.
- coord_diff_1 - coord1 minus coord2 variables.
- coord_diff_2 - coord2 minus coord1 variables.
- coords_combined - coord1 + coord2 variables.
Overall, I ended up using 13 features after feature selection.
BoxCox Transformation for Skewed Variables :
Most of the variables were highly skewed.
I applied box-cox transformation on variables with (+-)ve 0.25 skew.
I applied Standard Scaling transformation to standardize the data.
Things that I tried which didn't improve Validation score:
- Polynomial features/ Feature interactions.
- Mean, standard deviations, medians(Measures of central tendency) grouped by coordinates.
- Robust Scaling before removing the outliers.
- Stacking multiple models.
- Max voting based on multiple models.
- Dimensionality reduction.
I tried several models which resulted in the following local validation scores:
|Passive Aggressive Classifier:||0.47|
|Linear Discriminant Analysis:||0.67|
|Decision Tree Classifier:||0.89|
|Gradient Boosting Classifier:||0.89|
|Random Forest Classifier:||0.93|
|Extra Trees Classifier:||0.95|
Code available at github