Day 12 Part 1 Kaggle's 30 Days of ML

Day 12 Part 1 Kaggle's 30 Days of ML
フリーレン from 葬送のフリーレン.

Course step 2 of Intermediate ML tutorial. (step 1 is a review)


Dealing with missing values:

We cannot use data set with missing values or non-numerical values.

  1. Drop columns:
    Simply drop columns with missing values, but a single data column is important if we has a small data size.
    ex:
    // drop column "Price", axis=1 means column
    clean_data = date.drop(['Price'], axis=1)
    // only keep the numerical data type
    X = clean_data.select_dtypes(exclude=['object'])
  1. Impute missing values:
    Use a value to replace the missing value.
    a. use mean of the data in the same column
    b. use median of the data in the same column
    c. use most frequent data (for categorical var)
  1. Extend to impute:
    create a new column to mark the place of missing data.