kaggleã«pythonã使ã£ã¦ã¿ã(3) ãsklearn
sklearnã®ã©ã³ãã ãã©ã¬ã¹ãã使ãã
ååã®äºæ¸¬ã¢ãã«ã¯ãç·å½¢ã¢ãã«ã®ãã¸ã¹ãã£ãã¯å帰ã
ä»åã¯ãensembleã¢ã¸ã¥ã¼ã«å
ã«ããã©ã³ãã ãã©ã¬ã¹ãã試ãã¦ã¿ãã
ãã¨ããã¼ã¿ãå å·¥ããã¨ãã¨ããè©ä¾¡ããã¨ãã®ä¾¿å©ã¡ã½ãããããã¤ãã¡ã¢ã
from sklearn import preprocessing from sklearn import ensemble import pandas as pd
æåã«ã¤ã³ãã¼ãããã®ã¯ãpreprocessingã¢ã¸ã¥ã¼ã«ã¨ensembleã¢ã¸ã¥ã¼ã«ã
sklearn.preprocessing
preprocessingã¢ã¸ã¥ã¼ã«ã¯ãçãã¼ã¿ãå¦çãã¦ãå¾ã
ã®è§£æã«é©ããæ´å½¢ããã¦ãããã
å®éã«ãç·´ç¿ç¨csvãèªã¿è¾¼ãã§ä½¿ã£ã¦ã¿ãã
LabelEncoder()ã¯ãã«ãã´ãªã«ã«ãã¼ã¿ããæ°å¤(ã©ãã«)ã§è¡¨ç¾ããã¡ã½ããã
train = pd.read_csv('train.csv') # å¹´é½¢ã®NaNãã¼ã¿ãè£å®ã train.Age = train.Age.fillna(train.Age.mean()) # æ§å¥ãã©ãã«ã§è¡¨ç¤ºããã le_sex = preprocessing.LabelEncoder() train.Sex = le_sex.fit_transform(train.Sex) print train.Sex.head() >> 0 1 1 0 2 0 3 0 4 1 Name: Sex, dtype: int64
fit_trainsformã§ãæ´å½¢ãã¦ããããããããle_sexã¨ãããµãã«ãªãã¸ã§ã¯ããä½ã£ã¦ããã®ã¯ããã¨ã«æ»ãã¨ãã«ãinverse_trainsform()ã使ãããã
# [female, male] -> [0, 1] train.Sex = le_sex.fit_transform(train.Sex) # [0, 1] -> [female, male] train.Sex = le_sex.inverse_trainsform(train.Sex)
train.Embarkedãå å·¥ãã¦ããã
le_embarked = preprocessing.LabelEncoder() train.Embarked = le_embarked.fit_transform(train.Embarked)
ãã¼ã¿ãæºåã§ããã®ã§ãäºæ¸¬ã¢ãã«ãã¤ããã
ensemble.RandomForestClassifier()
ã©ã³ãã ãã©ã¬ã¹ãã使ãããã¯ããã¸ã¹ãã£ãã¯å帰ã¨å¤§ä½åãã
fit() -> predict() ã¨ããæµãã
y = train['Survived'] X = train[['Age', 'Sex', 'Pclass', 'SibSp', 'Parch', 'Fare', 'Embarked']] rf = ensemble.RandomForestClassifier() rf.fit(X, y) py = rf.predict(X)
sklearn.metrics
ååã¯ãäºæ¸¬ã®è©ä¾¡ã«pd.crosstab()ã使ã£ããã©ãsklearnã«ãæ¯è¼ç¨ã®ã¢ã¸ã¥ã¼ã«ãããã
from sklearn.metrics import confusion_matrix confusion_matrix(py, y) >> array([[539, 19], [ 10, 323]]) from sklearn.metrics import accuracy_score accuracy_score(py, y) >> 0.96745230078563416
nbviewer
-> http://nbviewer.ipython.org/6229646