Day 1 Kaggle's 30 Days of ML

フリーレン from 葬送のフリーレン.

Created my Kaggle account and started by Titanic Tutorial.

Some basic knowledge:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

Use Shift+Enter to create a new shell.

data_set_name.head()
# used to check first five rows

Then process the data set

# women here is a Panda Series of info, not a table
women = train_data.loc[train_data.Sex == 'female']["Survived"]

# use women to do further process
sum(women) / len(women)

Random forest model:

combines the output of multiple decision trees to reach a single result

This is a Ensemble model: merging predictions from multiple models

Note: can only use int to train model.

Target: The data we want to predict

features: What data we want to use for prediction

args in model: n_estimators: num of trees; max_depth: depth of tree; random_state the certain random state

# used to convert non-int var
pd.get_dummies(train_data[features])

# n_estimators: num of trees; max_depth: depth of tree; random_state the certain random state
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)

# X: data we want it to learn; target: what we want to predict
model.fit(X, target)

# output of predictions is binary, need to convert
predictions = model.predict(X_test)

# create output
output = pd.DataFrame({"PassengerId": test_data.PassengerId, "Survived": predictions})

output.to_csv('my_submission.csv', index=False)