ä»åã¯ãOSSã®OptunaããGoogle Colaboratoryã§ä½¿ãæé ã®è©³ç´°èª¬æã¨ãå®éã®å®è¡çµæãè¨è¼ãã¾ãã
ããã°ã©ã ã¯ãæ¸ç±ãOptunaã«ãããã©ãã¯ããã¯ã¹æé©åãã®2ç« ã®ãµã³ãã«ã³ã¼ãã使ç¨ãã¾ãã
ã¾ãããã©ã¤ã¢ã«ï¼å¦ç¿ï¼ã®åç¾æ§ã確ä¿ï¼åãå¦ç¿çµæãåç¾ããï¼ããOptunaã®ã¹ã¿ãã£ãåç¾ãããæ¹æ³ã«ã¤ãã¦ã説æãã¾ãã
ãã®å
容ãåèã«ãªãã°å¹¸ãã§ãã
ãOptunaã
åèæç®
åèãµã¤ã
âOptunaã®å
¬å¼ãµã¤ã
www.preferred.jp
âOptunaã®ããã¥ã¡ã³ãï¼ããã¥ã¢ã«ï¼
optuna.readthedocs.io
âæ¸ç±ãOptunaã«ãããã©ãã¯ããã¯ã¹æé©åãã®ãµã³ãã«ã³ã¼ã
github.com
ã¯ããã«
Optunaã®è¨äºä¸è¦§ã§ããè¯ãã£ããåèã«ãã¦ãã ããã
Optunaã®è¨äºä¸è¦§
ããã§ã¯ãæ¸ç±ãOptunaã«ãããã©ãã¯ããã¯ã¹æé©åãã®2ç« ã®ãµã³ãã«ã³ã¼ãã使ã£ã¦ãå®éã«ããã¤ãã¼ãã©ã¡ã¼ã¿ã®ãã¥ã¼ãã³ã°ããã£ã¦ããã¾ãã
éçºç°å¢ã®æºå
Optunaãå®è¡ããç°å¢ã«å¿
è¦ãªå
容ã説æãã¾ãã
æé
ã»æ¸ç±ã®ãµã³ãã«ã³ã¼ãããã¦ã³ãã¼ãããããããã¯ãèªåã®GitHubã«ãã©ã¼ã¯ããï¼ãã©ã¼ã¯ãããªãã¸ããªï¼
https://github.com/dk0893/optuna-bookï¼
ã»ãã©ã¼ã¯ãããªãã¸ããªãGoogleãã©ã¤ãã«ã¯ãã¼ã³ãã
ã»chapter2ãã£ã¬ã¯ããªã«ç§»åãã¦ããã®ãã£ã¬ã¯ããªã«ãå®éã«å®è¡ãããã¼ãããã¯ï¼ä¾ï¼ch2-exec.ipynbï¼ãä½æããï¼å
·ä½çã«ã¯ãGoogleãã©ã¤ãã§å³ã¯ãªãã¯ãã¦ããã®ä»âGoogle Colaboratoryãã¯ãªãã¯ããï¼
Googleãã©ã¤ãï¼Google Colaboratoryï¼GitHubã®éçºç°å¢ã«ã¤ãã¦ã¯ãå¥ã®è¨äºã§è©³ããæ¸ãã¦ããã®ã§ãå¿
è¦ã«å¿ãã¦åèã«ãã¦ãã ããã
daisuke20240310.hatenablog.com
Google Colaboratoryã§Optunaãå®è¡ãã
list_2_12_rf.pyãå®è¡ãã
list_2_12_rf.pyï¼https://github.com/dk0893/optuna-book/blob/master/chapter2/list_2_12_rf.pyï¼ã®ã½ã¼ã¹ã³ã¼ãã¯ä»¥ä¸ã®éãã§ãã¾ãã¯ãOptunaã使ããã«å¦ç¿ãå®è¡ããå®è£
ã«ãªã£ã¦ãã¾ãã
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
data = fetch_openml(name="adult")
X = pd.get_dummies(data["data"])
y = [1 if d == ">50K" else 0 for d in data["target"]]
clf = RandomForestClassifier(
max_depth=8,
min_samples_split=0.5,
)
score = cross_val_score(clf, X, y, cv=3)
accuracy = score.mean()
print(f"Accuracy: {accuracy}")
ãã¼ã¿ã»ããã®ãã¦ã³ãã¼ãã¨åå¦ç
adultã¨ããOpenMLã®ãã¼ã¿ã»ããã使ç¨ãã¦ãã¾ããOpenMLã¨ã¯ãæ©æ¢°å¦ç¿ã®ãã¼ã¿ã»ããããã®å®è¡ã½ã¼ã¹ã³ã¼ããçµæãå
±æãããã©ãããã©ã¼ã ã§ãã以ä¸ã§ãOpenMLã«ã¤ãã¦å°ã調ã¹ã¦ã¿ã¾ãã
OpenMLï¼https://www.openml.org/ï¼ã«ã¢ã¯ã»ã¹ãã¾ããå·¦ã®ãµã¤ããã¼ã®Datasetsãã¯ãªãã¯ãSearchã«ãadultãã¨å
¥åãã¦Enterãæ¼ãã¨adultã®ãã¼ã¿ã»ãããè¦ã¤ããã¾ããv1ããv4ã¾ã§ã®4種é¡ãããããã§ãã
list_2_12_rf.pyã§ã¯ãdata = fetch_openml(name="adult")
ã§ãã¼ã¿ã®ãã¦ã³ãã¼ããè¡ã£ã¦ãã¾ããscikit-learnã®ããã¥ã¢ã«ã®fetch_openmlï¼https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.htmlï¼ã«ããã¨ããã¼ã¸ã§ã³ã¯æå®ãã¦ãããããã®å ´åã¯ä¸çªå¤ããã¼ã¸ã§ã³ããã¦ã³ãã¼ãããããããã§ããã¤ã¾ããv1ããã¦ã³ãã¼ãããããã¨ã«ãªãã¾ãã
adultãã¼ã¿ã»ããã®v1ãè¦ã¦ã¿ã¾ããData Detailã表示ããã¾ãã
adultã¯ãUCIã®ãã¼ã¿ã»ããã§ãããv2ããªãªã¸ãã«ã®ãã¼ã¸ã§ã³ããããv1ã¯ãããã¤ãã®ç¹å¾´éãé¢æ£åããã¦ãããã¨æ¸ããã¦ãã¾ãã
ç¶ãã¦ãAnalysisãè¦ã¦ã¿ã¾ãããã¼ã¿ã»ããã®åæãç°¡æçã«ï¼ååã«ï¼ï¼è¡ããããã§ãã
adultã®ãã¼ã¿ã»ããã¯ã48842ã®ãµã³ãã«æ°ãæã¡ã15ã®ç¹å¾´éãæã¡ï¼ãã ãããã®ãã¡ã®1ã¤ã¯æ£è§£ã©ãã«ãªã®ã§ãå¦ç¿ã§ä½¿ç¨ããç¹å¾´éã¯14ï¼ã48842Ã15ã®ãã¼ãã«ãã¼ã¿ã§ããå¦ç¿ã§ä½¿ç¨ãã14åã®ç¹å¾´éã®ãã¡ã2åã¯æ°å¤ãã¼ã¿ã§ã12åã¯æ°å¤ä»¥å¤ã®ãã¼ã¿ï¼æååï¼ã§ãã
ããã§ãadultã®ãã¼ã¿ã»ããã«ã¤ãã¦å°ã説æãã¦ããã¾ããå
¨é¨ã§48842人åã®ãµã³ãã«ãããã1ãµã³ãã«ã¯ããã人ã®å¹´é½¢ãéç¨ã¯ã©ã¹ãæé«å¦æ´ã人種ï¼ç½äººãé»äººãªã©ï¼ãå´åæéãåºèº«å½ãªã©ã®ç¹å¾´ãæã¡ãæ£è§£ã©ãã«ã¯ãå¹´é50Kãã«ãè¶
ããåå
¥ãæã£ã¦ãããã©ããã§ããããããäºæ¸¬ããåé¡åé¡ã§ãã
taskã¯ãadultã®ãã¼ã¿ã»ããã使ã£ãè©ä¾¡æ¹æ³ï¼åé¡ãã¯ã©ã¹ã¿ãªã³ã°ãªã©ï¼ãæ¸ããã¦ãã¾ãã
X = pd.get_dummies(data["data"])
ã§ã¯ã14åã®ç¹å¾´éã®ãã¡ãæ°å¤ä»¥å¤ã®ãã¼ã¿ãæ°å¤ãã¼ã¿ã«å¤æï¼OneHot表ç¾ãªã©ï¼ãã¦ãããçµæçã«ãç¹å¾´éã¯14åãã121åã«å¢ãã¦ãã¾ãã
y = [1 if d == ">50K" else 0 for d in data["target"]]
ã§ã¯ãæ£è§£ã©ãã«ããæåã§ã0ã¨1ã®ãã¼ã¿ã«å¤æãã¦ãªã¹ãã«æ ¼ç´ãã¦ãã¾ãã
æ©æ¢°å¦ç¿ã¢ãã«ã®åæå
ããã§ã¯ãscikit-learnã®RandomForestClassifierï¼https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.htmlï¼ã使ç¨ãã¦ãã¾ããå¼æ°ã®max_depthã¯ã決å®æ¨ã®æ大ã®æ·±ãã§ããã8ãæå®ãã¦ãããå¼æ°ã®min_samples_splitã¯åå²ããããã«å¿
è¦ãªãµã³ãã«æ°ã®æå°å¤ã§ããã0.5ãæå®ãã¦ãã¾ãã
ã©ã³ãã ãã©ã¬ã¹ãã¨ã¯ã決å®æ¨ãè¤æ°ï¼ããã©ã«ã100åï¼å¦ç¿ããã¦ããã®ã¢ã³ãµã³ãã«ï¼å¹³åãªã©ï¼ã§äºæ¸¬ããéå¦ç¿ã«å¼·ããé常ã«åªç§ãªã¢ãã«ã§ãã
決å®æ¨ã¨ã¯ãã¨ã³ãããã¼ããã¸ãä¸ç´åº¦ã¨ãã£ãï¼è¤æ°ã®æ¹å¼ãåå¨ããRandomForestClassifierã®ããã©ã«ãã¯ã¸ãä¸ç´åº¦ï¼ãåã¯ã©ã¹ã®æ··å¨å
·åãææ¨ã¨ããæ¨æ§é ã§åé¡ããææ³ã§ããã¨ã³ãããã¼ã¨ã¸ãä¸ç´åº¦ã¯ãè¤æ°ã®ãµã³ãã«ã®ä¸ã«ãã¯ã©ã¹ã®æ··å¨ãããã°é«ãå¤ã示ããã¯ã©ã¹ã®æ··å¨ãå°ãªãã¨å°ããå¤ã示ãææ¨ã§ããä¾ãã°ã2ã¯ã©ã¹ã®å ´åã§ãã¯ã©ã¹ã®ãµã³ãã«æ°ãåãæ°ãã¤ãã£ãå ´åã¯0.5ãçæ¹ã®ã¯ã©ã¹ããåå¨ããªãå ´åã¯0ã¨ããå¤ã«ãªãã¾ãã以ä¸ã«ãã¸ãä¸ç´åº¦ã®å¼ã示ãã¾ãã
ã¯ãã¼ãã ã¯ã¯ã©ã¹æ°ã示ãã¾ãã ã¯ãå
¨ã¯ã©ã¹ã®ç¢ºçã®åã§ããããã1ã«ãªããããå¼å¤å½¢ãå¯è½ã§ãã
決å®æ¨ã®å
·ä½çãªæé ã¨ãã¦ã¯ãã¾ãã親ãã¼ãã«å
¨ãµã³ãã«ãããç¶æ
ãåæç¶æ
ã¨ããå
¨ç¹å¾´éã§åå²ï¼æ°å¤ã®å ´åã¯å¹³åå¤ãé¾å¤ã¨ããï¼ããã¦ã¿ã¦ãä¸çªã¨ã³ãããã¼ãã¸ãä¸ç´åº¦ãæ¸å°ããåå²ãæ¡ç¨ãã¾ãï¼ãã®ç¶æ
ã§ã親ãã¼ãï¼åãã¼ã2ã¤ï¼ãåæ§ã«ãå
¨ã¦ã®åãã¼ãã«ã¤ãã¦ãåæ§ã®æä½ãè¡ããã¨ã³ãããã¼ããã¸ãä¸ç´åº¦ã0ã«ãªãï¼ä»ã®ã¯ã©ã¹ãæ··å¨ããªãç¶æ
ï¼ã¾ã§ç¹°ãè¿ãã¾ãã
交差æ¤è¨¼ã«ããè©ä¾¡
ããã§ã¯ãscikit-learnã®cross_val_scoreï¼https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.htmlï¼ã使ã£ã¦äº¤å·®æ¤è¨¼ãå®è¡ãã¾ããå¼æ°ã®clfã¯ã¢ãã«ã®ã¤ã³ã¹ã¿ã³ã¹ãXã¯å¦ç¿ãã¼ã¿ã»ãããyã¯æ£è§£ã©ãã«ãcvã¯ã¯ãã¹ããªãã¼ã·ã§ã³ã®åå²æ°ã§ãã
交差æ¤è¨¼ã¯ãã¯ãã¹ããªãã¼ã·ã§ã³ã¨ãå¼ã°ããæå®ããå¦ç¿ãã¼ã¿ã¨æ£è§£ã©ãã«ã使ããæ±åæ§è½ãé«ããå¦ç¿ææ³ã§ãã
å
·ä½çã«ã¯ãå¦ç¿ãã¼ã¿ãæå®ããåå²æ°ã§åå²ï¼ãã¼ã¿ã»ãã0ã1ã2ã¨ããï¼ããæåã¯å
é ã®ãã¼ã¿ã»ãã0ãæ¤è¨¼ãã¼ã¿ã¨ãããã¼ã¿ã»ãã1ã¨2ã使ã£ã¦å¦ç¿ãè¡ããæ¤è¨¼ãã¼ã¿ã§æ¨è«ããçµæï¼åé¡ç²¾åº¦ï¼ãä¿æãã¦ããã¾ãã次ã«ãã¼ã¿ã»ãã1ãæ¤è¨¼ãã¼ã¿ã¨ãã¦ãåæ§ã«å¦ç¿ã¨æ¨è«ãç¹°ãè¿ãã¾ãããã¼ã¿ã»ãã2ãåæ§ã«è¡ããcross_val_scoreã¯3åã®çµæãè¿ãã¾ãããã®å¾ãå¹³åãè¨ç®ãã¦ãAccuracyï¼æ£ççï¼ã表示ãã¦ãã¾ãã
å®è¡çµæ
å®éã«Google Colaboratoryã§å®è¡ãã¦ã¿ã¾ããã
ç´0.76ã®åé¡ç²¾åº¦ã§ãæ¸ç±ã«æ¸ããã¦ã精度ã¨ãå
¨ãåã精度ã«ãªãã¾ããã
list_2_14_optimize_rf.pyãå®è¡ãã
list_2_14_optimize_rf.pyï¼https://github.com/dk0893/optuna-book/blob/master/chapter2/list_2_14_optimize_rf.pyï¼ã®ã½ã¼ã¹ã³ã¼ãã¯ä»¥ä¸ã®éãï¼èª¬æããããããã«ãä¸é¨ã³ã¡ã³ãã追å ãã¦ãã¾ãï¼ã§ãOptunaã®ãã©ãã¯ããã¯ã¹æé©åã使ã£ãå®è£
ã¨ãªã£ã¦ãã¾ãã
import optuna
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
data = fetch_openml(name="adult")
X = pd.get_dummies(data["data"])
y = [1 if d == ">50K" else 0 for d in data["target"]]
def objective(trial):
clf = RandomForestClassifier(
max_depth=trial.suggest_int(
"max_depth", 2, 32,
),
min_samples_split=trial.suggest_float(
"min_samples_split", 0, 1,
),
)
score = cross_val_score(clf, X, y, cv=3)
accuracy = score.mean()
return accuracy
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
print(f"Best objective value: {study.best_value}")
print(f"Best parameter: {study.best_params}")
ãã¼ã¿ã®ãã¦ã³ãã¼ãã¨åå¦ç
list_2_12_rf.pyã¨åãã§ãã
objectiveã¡ã½ããã®å®ç¾©
objectiveã¡ã½ããã®å
容ã¨ãã¦ã¯ãlist_2_12_rf.pyã®æ©æ¢°å¦ç¿ã¢ãã«ã®åæåã¨ããä¼¼ã¦ãã¾ãããRandomForestClassifierã®å¼æ°ã®æå®æ¹æ³ãå¤æ´ããã¦ãã¾ãã
ã¾ããå¼æ°ã®max_depthã«ã¯ãtrial.suggest_int("max_depth", 2, 32,)
ãæå®ããã¦ãããmax_depthã®æ¢ç´¢ç¯å²ã¨ãã¦ãæ´æ°ã®2ãã32ãæå®ããã¦ãã¾ãã
Optunaã®ããã¥ã¢ã«ã®suggest_intï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_intï¼ãè¦ã¾ããsuggest_int(name, low, high, *, step=1, log=False)
ã¨ããã2<=max_depth<=32
ã®ç¯å²ãæ¢ç´¢ããããã«æå®ãã¦ãã¾ãã
å¼æ°ã®è©³ç´°ã«ã¤ãã¦ã¯ãname
ã¯ä»»æã®ååï¼å¼æ°åã«ãã¦ãããæ¹ãããï¼ãæå®ããå¼æ°ã®ã¨ãç¯å²ãlow
ã¨high
ã«æå®ãã¾ããstep
ã¯ã1ã®å ´åã¯å¼æ°ã®ã¨ãç¯å²ã®å
¨ã¦ã使ç¨ãã2以ä¸ãæå®ããå ´åã¯ãlow, low+step, low+2*step, ...
ã®ããã«ä½¿ç¨ãã¾ããlog
ã¯ãå¼æ°ã®ç¯å²ã対æ°é åã«å¤æããããµã³ããªã³ã°ããã¦ãå
ã®é åã«æ»ãããå¤ã使ç¨ããã¾ãã
min_samples_splitã«ã¯ãtrial.suggest_float("min_samples_split", 0, 1,)
ãæå®ããã¦ãããmin_samples_splitã®æ¢ç´¢ç¯å²ã¨ãã¦ãå°æ°ã®0ãã1ãæå®ããã¦ãã¾ãã
Optunaã®ããã¥ã¢ã«ã®suggest_floatï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_floatï¼ãè¦ãã¨ãsuggest_float(name, low, high, *, step=None, log=False)
ã¨ãããsuggest_int()
ã¨ã»ã¼åãã§ãã
objectiveã¡ã½ããã®æ»ãå¤ã¯Accuracyï¼åé¡ç²¾åº¦ï¼ã§ãã
ã¹ã¿ãã£ãªãã¸ã§ã¯ãã®ä½æã¨æé©åã®å®è¡ã¨çµæ表示
create_studyã§ãæ大åãç®çã¨ããã¹ã¿ãã£ãªãã¸ã§ã¯ããä½ãããoptimizeã§å®éã«æé©åãå®è¡ãã¦ãã¾ããæå¾ã«ãæé©åã®çµæããã¹ãã®åé¡ç²¾åº¦ã¨ããã®ã¨ãã«ä½¿ç¨ããå¼æ°ã表示ããã¦ãã¾ãã
Optunaã®ããã¥ã¢ã«ã®create_studyï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_studyï¼ãè¦ã¾ããå¼æ°ã®direction="maximize"
ã§ãæ大åã®æé©åãæå®ãã¦ãã¾ãã
Optunaã®ããã¥ã¢ã«ã®optimizeï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimizeï¼ãè¦ã¾ããn_trials=100
ã§ã試è¡åæ°ï¼ã©ã³ãã ãã©ã¬ã¹ãã®å¦ç¿ã®åæ°ï¼ãæå®ãã¦ãã¾ãã
Optunaã®ããã¥ã¢ã«ã®Studyã¯ã©ã¹ï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna-study-studyï¼ãè¦ã¾ããAttributesï¼å±æ§ï¼ã«ãbest_value
ã¨ãbest_params
ããããããããã®è©³ç´°ã®èª¬æã¸ã®ãªã³ã¯ãããã¾ãããã®2ã¤ä»¥å¤ã«ããåç
§å¯è½ãªå±æ§ã使ç¨ã§ãããã¨ãåããã¾ãã
以ä¸ãå®è¡ãã¾ãã
å®è¡çµæ
å®éã«Google Colaboratoryã§å®è¡ãã¦ã¿ã¾ããã
[I 2024-03-18 16:48:07,908] A new study created in memory with name: no-name-0ee66a95-2f97-4355-a36b-c7dd40acc8c0
[I 2024-03-18 16:48:10,305] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 6, 'min_samples_split': 0.8390462700980509}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-18 16:48:13,301] Trial 1 finished with value: 0.762110464653306 and parameters: {'max_depth': 24, 'min_samples_split': 0.41546091404803887}. Best is trial 1 with value: 0.762110464653306.
[I 2024-03-18 16:48:15,060] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 29, 'min_samples_split': 0.969444032753063}. Best is trial 1 with value: 0.762110464653306.
[I 2024-03-18 16:48:16,843] Trial 3 finished with value: 0.7607182349443268 and parameters: {'max_depth': 7, 'min_samples_split': 0.9031391948680733}. Best is trial 1 with value: 0.762110464653306.
[I 2024-03-18 16:48:18,605] Trial 4 finished with value: 0.7607182349443268 and parameters: {'max_depth': 19, 'min_samples_split': 0.7612784794776261}. Best is trial 1 with value: 0.762110464653306.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-18 16:59:45,564] Trial 95 finished with value: 0.8463617280781461 and parameters: {'max_depth': 21, 'min_samples_split': 0.02643139290358594}. Best is trial 84 with value: 0.8554113563787417.
[I 2024-03-18 16:59:51,495] Trial 96 finished with value: 0.8298390555991441 and parameters: {'max_depth': 26, 'min_samples_split': 0.10423539092493249}. Best is trial 84 with value: 0.8554113563787417.
[I 2024-03-18 17:00:00,539] Trial 97 finished with value: 0.841591241978196 and parameters: {'max_depth': 22, 'min_samples_split': 0.045115133886761236}. Best is trial 84 with value: 0.8554113563787417.
[I 2024-03-18 17:00:08,465] Trial 98 finished with value: 0.8340567274646876 and parameters: {'max_depth': 17, 'min_samples_split': 0.07261103742480408}. Best is trial 84 with value: 0.8554113563787417.
[I 2024-03-18 17:00:22,431] Trial 99 finished with value: 0.8547561884211966 and parameters: {'max_depth': 19, 'min_samples_split': 0.0011583420552988678}. Best is trial 84 with value: 0.8554113563787417.
Best objective value: 0.8554113563787417
Best parameter: {'max_depth': 23, 'min_samples_split': 0.0017140477373159217}
ç´12åãããã100åã®å¦ç¿ã¨è©ä¾¡ã§ããã©ãã¯ããã¯ã¹æé©åãå®è¡ããç´0.85ã®åé¡ç²¾åº¦ã¨ãªãã¾ãããæ¸ç±ã®çµæãããå°ãä½ãã§ãããOptunaã使ç¨ãã¦ãªãã£ãå ´åã®ç´0.76ã®åé¡ç²¾åº¦ããã大ããæ¹åãã¾ããã
Optunaã®ã¹ã¿ãã£ãåç¾ããã
list_2_14_optimize_rf.pyãããä¸åº¦å®è¡ããã¨ã以ä¸ã®ããã«ãç°ãªãçµæãå¾ããããã¨ãåããã¾ãã
[I 2024-03-19 14:38:36,453] A new study created in memory with name: no-name-ba409f79-d91c-47c6-90ac-36f4c8e3be61
[I 2024-03-19 14:38:39,240] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 3, 'min_samples_split': 0.4896570976010266}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:38:51,909] Trial 1 finished with value: 0.8513369845044804 and parameters: {'max_depth': 16, 'min_samples_split': 0.006380320617366264}. Best is trial 1 with value: 0.8513369845044804.
[I 2024-03-19 14:38:53,625] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 8, 'min_samples_split': 0.8767220264426059}. Best is trial 1 with value: 0.8513369845044804.
[I 2024-03-19 14:39:05,614] Trial 3 finished with value: 0.8519512061938818 and parameters: {'max_depth': 30, 'min_samples_split': 0.013673573424747842}. Best is trial 3 with value: 0.8519512061938818.
[I 2024-03-19 14:39:11,120] Trial 4 finished with value: 0.827095533646114 and parameters: {'max_depth': 11, 'min_samples_split': 0.16592141865436927}. Best is trial 3 with value: 0.8519512061938818.
ã»ã»ã»ä»¥éå²æã»ã»ã»
å
¨ãåãçµæãå¾ãããã«ã¯ãä¹±æ°ã·ã¼ãã®è¨å®ãå¿
è¦ã§ããOptunaã®å
¬å¼ãµã¤ãã®FAQã«æ¸ããã¦ãã¾ãï¼https://optuna.readthedocs.io/en/stable/faq.html#how-can-i-obtain-reproducible-optimization-resultsï¼ã
å®éã«ãã£ã¦ã¿ã¾ããã½ã¼ã¹ã³ã¼ãã®å¤æ´ç¹ã¯ä»¥ä¸ã®éãã§ãï¼ãã以å¤ã¯å¤æ´ããã¾ããï¼ã
å¤æ´å
study = optuna.create_study(direction="maximize")
å¤æ´å¾
sampler = optuna.samplers.TPESampler(seed=0)
study = optuna.create_study(sampler=sampler, direction="maximize")
ä»åã¯ä¹±æ°ã·ã¼ãã«0ãè¨å®ãã¾ããããä»»æã®æ´æ°ãè¨å®ã§ãã¾ãã
å¤æ´ããlist_2_14_optimize_rf.pyãå®è¡ããçµæã¯ä»¥ä¸ã®éãã§ãã
å度ãlist_2_14_optimize_rf.pyãå®è¡ãã¾ãã
[I 2024-03-19 14:46:10,076] A new study created in memory with name: no-name-4c0f6c2a-fe7c-4fd3-94b8-0c1752aba146
[I 2024-03-19 14:46:12,393] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 19, 'min_samples_split': 0.7151893663724195}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:46:15,851] Trial 1 finished with value: 0.7607182349443268 and parameters: {'max_depth': 20, 'min_samples_split': 0.5448831829968969}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:46:17,725] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.6458941130666561}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:46:19,587] Trial 3 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.8917730007820798}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:46:22,920] Trial 4 finished with value: 0.7698905410762791 and parameters: {'max_depth': 31, 'min_samples_split': 0.3834415188257777}. Best is trial 4 with value: 0.7698905410762791.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-19 14:57:09,656] Trial 95 finished with value: 0.8485934462026226 and parameters: {'max_depth': 18, 'min_samples_split': 0.021975518972960142}. Best is trial 82 with value: 0.8548585471747439.
[I 2024-03-19 14:57:25,576] Trial 96 finished with value: 0.8543466880116962 and parameters: {'max_depth': 21, 'min_samples_split': 0.00022872922730136322}. Best is trial 82 with value: 0.8548585471747439.
[I 2024-03-19 14:57:35,035] Trial 97 finished with value: 0.8389909634243521 and parameters: {'max_depth': 21, 'min_samples_split': 0.053218221319534575}. Best is trial 82 with value: 0.8548585471747439.
[I 2024-03-19 14:57:52,642] Trial 98 finished with value: 0.8537938637164729 and parameters: {'max_depth': 23, 'min_samples_split': 0.00018727384142899536}. Best is trial 82 with value: 0.8548585471747439.
[I 2024-03-19 14:57:58,945] Trial 99 finished with value: 0.8323778713634988 and parameters: {'max_depth': 24, 'min_samples_split': 0.08890826469424906}. Best is trial 82 with value: 0.8548585471747439.
Best objective value: 0.8548585471747439
Best parameter: {'max_depth': 18, 'min_samples_split': 0.00023055836175293556}
ããä¸åº¦ãåãã½ã¼ã¹ã³ã¼ãã§å®è¡ãã¾ãã
[I 2024-03-19 14:59:04,918] A new study created in memory with name: no-name-ba4c001c-7cee-41a4-a29b-007c2702430c
[I 2024-03-19 14:59:06,863] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 19, 'min_samples_split': 0.7151893663724195}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:59:09,684] Trial 1 finished with value: 0.7607182349443268 and parameters: {'max_depth': 20, 'min_samples_split': 0.5448831829968969}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:59:11,386] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.6458941130666561}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:59:14,774] Trial 3 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.8917730007820798}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 14:59:18,428] Trial 4 finished with value: 0.7698087944218402 and parameters: {'max_depth': 31, 'min_samples_split': 0.3834415188257777}. Best is trial 4 with value: 0.7698087944218402.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-19 15:11:43,799] Trial 95 finished with value: 0.8462798544058909 and parameters: {'max_depth': 29, 'min_samples_split': 0.027005809926542394}. Best is trial 71 with value: 0.856025578068143.
[I 2024-03-19 15:11:50,199] Trial 96 finished with value: 0.8296547653236431 and parameters: {'max_depth': 27, 'min_samples_split': 0.10525520375075349}. Best is trial 71 with value: 0.856025578068143.
[I 2024-03-19 15:11:53,330] Trial 97 finished with value: 0.7607182349443268 and parameters: {'max_depth': 28, 'min_samples_split': 0.5034699130716497}. Best is trial 71 with value: 0.856025578068143.
[I 2024-03-19 15:12:03,023] Trial 98 finished with value: 0.8413455442477001 and parameters: {'max_depth': 26, 'min_samples_split': 0.044533665120120094}. Best is trial 71 with value: 0.856025578068143.
[I 2024-03-19 15:12:11,240] Trial 99 finished with value: 0.8335653760197707 and parameters: {'max_depth': 24, 'min_samples_split': 0.08929905275628226}. Best is trial 71 with value: 0.856025578068143.
Best objective value: 0.856025578068143
Best parameter: {'max_depth': 28, 'min_samples_split': 0.001017849703871106}
æåã®æ¹ã¯åç¾ã§ãã¦ããããéä¸ããçµæãç°ãªã£ã¦ãã¾ãã
åå ã¯ãç®çé¢æ°ï¼objectiveï¼ã®ã©ã³ãã ãã©ã¬ã¹ããä¹±æ°ã使ç¨ãã¦ããããããã®ä¹±æ°ã·ã¼ããè¨å®ã§ãã¦ããªãããã ã¨èãããã¾ãã
対çã¨ãã¦ã¯ãä¹±æ°ã·ã¼ãè¨å®ã®ã¡ã½ããã®è¿½å ã¨ãç®çé¢æ°ã®å
é ã«è¿½å ããä¹±æ°ã·ã¼ãè¨å®ã¡ã½ããã®å¼ã³åºãã追å ãã¾ãã
追å
import random
def set_random_seed( seed ):
random.seed( seed )
np.random.seed( seed )
å¤æ´å
def objective(trial):
clf = RandomForestClassifier(
å¤æ´å¾
def objective(trial):
set_random_seed( 0 )
clf = RandomForestClassifier(
å度ãlist_2_14_optimize_rf.pyãå®è¡ãã¾ãã
[I 2024-03-19 15:23:18,914] A new study created in memory with name: no-name-5194e802-dc00-4b23-bec7-4f2eedade006
[I 2024-03-19 15:23:21,402] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 19, 'min_samples_split': 0.7151893663724195}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:23:24,625] Trial 1 finished with value: 0.7607182349443268 and parameters: {'max_depth': 20, 'min_samples_split': 0.5448831829968969}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:23:26,529] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.6458941130666561}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:23:28,338] Trial 3 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.8917730007820798}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:23:31,842] Trial 4 finished with value: 0.7693788064158434 and parameters: {'max_depth': 31, 'min_samples_split': 0.3834415188257777}. Best is trial 4 with value: 0.7693788064158434.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-19 15:36:32,049] Trial 95 finished with value: 0.841816464688217 and parameters: {'max_depth': 27, 'min_samples_split': 0.040705958355933866}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:36:40,530] Trial 96 finished with value: 0.8340363065211047 and parameters: {'max_depth': 26, 'min_samples_split': 0.07378979605757345}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:36:47,251] Trial 97 finished with value: 0.8297980829213555 and parameters: {'max_depth': 25, 'min_samples_split': 0.11078448646792491}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:36:57,877] Trial 98 finished with value: 0.8491872117355811 and parameters: {'max_depth': 25, 'min_samples_split': 0.018643125560272363}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:37:02,268] Trial 99 finished with value: 0.7722042573756228 and parameters: {'max_depth': 26, 'min_samples_split': 0.35128774524331274}. Best is trial 59 with value: 0.85592318661694.
Best objective value: 0.85592318661694
Best parameter: {'max_depth': 27, 'min_samples_split': 0.0010546408626361447}
ããä¸åº¦ãåãã½ã¼ã¹ã³ã¼ãã§å®è¡ãã¾ãã
[I 2024-03-19 15:39:07,592] A new study created in memory with name: no-name-bf81351c-ea09-4557-9b2e-8b8a34444c7a
[I 2024-03-19 15:39:09,480] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 19, 'min_samples_split': 0.7151893663724195}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:39:12,507] Trial 1 finished with value: 0.7607182349443268 and parameters: {'max_depth': 20, 'min_samples_split': 0.5448831829968969}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:39:14,251] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.6458941130666561}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:39:16,797] Trial 3 finished with value: 0.7607182349443268 and parameters: {'max_depth': 15, 'min_samples_split': 0.8917730007820798}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-19 15:39:20,703] Trial 4 finished with value: 0.7693788064158434 and parameters: {'max_depth': 31, 'min_samples_split': 0.3834415188257777}. Best is trial 4 with value: 0.7693788064158434.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-19 15:51:55,304] Trial 95 finished with value: 0.841816464688217 and parameters: {'max_depth': 27, 'min_samples_split': 0.040705958355933866}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:52:03,229] Trial 96 finished with value: 0.8340363065211047 and parameters: {'max_depth': 26, 'min_samples_split': 0.07378979605757345}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:52:09,515] Trial 97 finished with value: 0.8297980829213555 and parameters: {'max_depth': 25, 'min_samples_split': 0.11078448646792491}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:52:20,484] Trial 98 finished with value: 0.8491872117355811 and parameters: {'max_depth': 25, 'min_samples_split': 0.018643125560272363}. Best is trial 59 with value: 0.85592318661694.
[I 2024-03-19 15:52:24,925] Trial 99 finished with value: 0.7722042573756228 and parameters: {'max_depth': 26, 'min_samples_split': 0.35128774524331274}. Best is trial 59 with value: 0.85592318661694.
Best objective value: 0.85592318661694
Best parameter: {'max_depth': 27, 'min_samples_split': 0.0010546408626361447}
åãã¹ã¿ãã£ãåç¾ã§ãã¾ããã
list_2_16_optimize_rf_gb_with_conditional_search_space.py
list_2_16_optimize_rf_gb_with_conditional_search_space.pyï¼https://github.com/dk0893/optuna-book/blob/master/chapter2/list_2_16_optimize_rf_gb_with_conditional_search_space.pyï¼ã§ã¯ãè¤æ°ã®ã¢ãã«ã使ãæ¹æ³ãå®è£
ããã¦ãã¾ãã
å
·ä½çã«ã¯ãRandomForestClassifierã«å ãã¦ãscikit-learnã®GradientBoostingClassifierã®2ã¤ã®ã¢ãã«ã使ç¨ããã¦ãããããã«ãã¹ã¿ãã£ãä½æããã¨ãã«ãã¹ã¿ãã£åã¨ã¹ãã¬ã¼ã¸ãæå®ãããã¨ã§ãSQLiteã®ãã¼ã¿ãã¼ã¹ãä½æãããã®ãã¼ã¿ãã¼ã¹ã«å®è¡çµæãç»é²ãã¦ãã¾ãã
ã¹ã¿ãã£ãªãã¸ã§ã¯ãã®ä½æã¨æé©åã®å®è¡ã¨çµæ表示
ä»åã¯ããã¼ã¿ãã¼ã¹ã«ã¤ãã¦æ·±å ããããããã¢ãã«ã®ã¨ããã¯list_2_14_optimize_rf.pyã®ã¾ã¾ã¨ãããã¼ã¿ãã¼ã¹ã使ç¨ããã¨ããã ããæ¡ç¨ãã¦ãå®è¡ãã¦ã¿ã¾ãã以ä¸ã«ãlist_2_14_optimize_rf.pyããå¤æ´ããç®æã示ãã¾ãã
å¤æ´å
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
print(f"Best objective value: {study.best_value}")
print(f"Best parameter: {study.best_params}")
å¤æ´å¾
study = optuna.create_study(
study_name="ch2-rf-seed",
storage="sqlite:///optuna.db",
direction="maximize")
study.optimize(objective, n_trials=100)
print(f"Best objective value: {study.best_value}")
print(f"Best parameter: {study.best_params}")
Optunaã®ããã¥ã¢ã«ã®create_studyï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_studyï¼ãè¦ã¾ãã
å¼æ°ã®study_name="ch2-rf-seed"
ã§ãã¹ã¿ãã£åãæå®ãã¦ãã¾ããæå®ããªãã¦ããã¦ãã¼ã¯ãªååãèªåã§ä»ãããã¾ãããå¾ã§è¦ãã¨ãã«åããããããããæå®ããæ¹ãããã§ãããã
å¼æ°ã®storage="sqlite:///optuna.db"
ã§ãSQLiteã®ãã¼ã¿ãã¼ã¹ãã¡ã¤ã«åï¼ã¹ãã¬ã¼ã¸ï¼ãæå®ãã¦ãããã¾ã ãã¼ã¿ãã¼ã¹ãã¡ã¤ã«ãåå¨ãã¦ããªãå ´åã¯ãã«ã¬ã³ããã£ã¬ã¯ããªã«optuna.db
ã¨ãããã¡ã¤ã«åã§ãã¼ã¿ãã¼ã¹ãã¡ã¤ã«ãä½æããã¾ãã
storageå¼æ°ãæå®ããªãå ´åãä»åå®è¡ããã¹ã¿ãã£ã¯ã©ãã«ãä¿åããã¾ããã
ãã®å ´åãä¸çªè¯ãã£ããã¤ãã¼ãã©ã¡ã¼ã¿ã§ããä¸åº¦å¦ç¿ããããå ´åããã®ãã©ã¡ã¼ã¿ãprintæã§è¡¨ç¤ºãã¦ããã°ããããè¦ã¦ããã®ãã©ã¡ã¼ã¿ãè¨å®ããã°ãåããã©ã¡ã¼ã¿ã§å¦ç¿ãããã¨ãåºæ¥ãããããã¾ãããããããæµ®åå°æ°ç¹æ°ã®ãã©ã¡ã¼ã¿ã®å ´åãprintæã§å
¨ã¦ã表示ããã¦ããªãå ´åãããã®ã§ãå
¨ããã©ã¡ã¼ã¿ãè¨å®ãããã¨ãåºæ¥ããå¦ç¿ãåç¾ãããã¨ã¯ã§ããªãããããã¾ããã
ãã¼ã¿ãã¼ã¹ã¨è¨ã£ã¦ããããã¾ã§å¤§ããªãã¡ã¤ã«ã«ãªãããã§ã¯ãªãã®ã§ãstorageå¼æ°ãæå®ãã¦ããã¤ãã¹ãã¬ã¼ã¸ã«ã¹ã¿ãã£ã®çµæãä¿åããããã«ããæ¹ãããã§ãããã
ã¾ãããã¼ã¿ãã¼ã¹ãã¡ã¤ã«ãä½æãã¦ããã°ãOptuna Dashboardã使ãã¾ãã
Optuna Dashboardã§ã¯ããã©ã¦ã¶ä¸ã§å®è¡ããã¹ã¿ãã£ã«ã¤ãã¦ãæ§ã
ãªåæãè¡ãã¾ãã使ãæ¹ã«ã¤ãã¦ã¯ãå¥ã®è¨äºãæ¸ããã®ã§ãè¯ãã£ããåèã«ãã¦ãã ããã
daisuke20240310.hatenablog.com
å®è¡çµæ
[I 2024-03-20 06:01:37,398] A new study created in RDB with name: ch2-rf-seed
[I 2024-03-20 06:01:40,190] Trial 0 finished with value: 0.7607182349443268 and parameters: {'max_depth': 18, 'min_samples_split': 0.5638938133595764}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-20 06:01:42,109] Trial 1 finished with value: 0.7607182349443268 and parameters: {'max_depth': 32, 'min_samples_split': 0.788055583212047}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-20 06:01:45,104] Trial 2 finished with value: 0.7607182349443268 and parameters: {'max_depth': 27, 'min_samples_split': 0.6024974755098238}. Best is trial 0 with value: 0.7607182349443268.
[I 2024-03-20 06:01:50,297] Trial 3 finished with value: 0.8142785254725554 and parameters: {'max_depth': 5, 'min_samples_split': 0.14468519929589163}. Best is trial 3 with value: 0.8142785254725554.
[I 2024-03-20 06:01:53,740] Trial 4 finished with value: 0.7720404672726398 and parameters: {'max_depth': 12, 'min_samples_split': 0.3523860798470476}. Best is trial 3 with value: 0.8142785254725554.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-20 06:13:55,242] Trial 95 finished with value: 0.8302075581788131 and parameters: {'max_depth': 19, 'min_samples_split': 0.09428362784266728}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:13:59,747] Trial 96 finished with value: 0.7684779570766304 and parameters: {'max_depth': 20, 'min_samples_split': 0.391099669378463}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:14:08,315] Trial 97 finished with value: 0.8446419319968242 and parameters: {'max_depth': 15, 'min_samples_split': 0.03303767220036553}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:14:16,474] Trial 98 finished with value: 0.8342410227705971 and parameters: {'max_depth': 25, 'min_samples_split': 0.0714018376653957}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:14:22,132] Trial 99 finished with value: 0.8282010992348194 and parameters: {'max_depth': 17, 'min_samples_split': 0.12281022399304492}. Best is trial 72 with value: 0.8554523013892831.
Best objective value: 0.8554523013892831
Best parameter: {'max_depth': 23, 'min_samples_split': 0.0009062357253867216}
ã¹ã¿ãã£ãã¹ãã¬ã¼ã¸ã«ä¿åããæ©è½ã追å ãã以å¤ã¯ãlist_2_14_optimize_rf.pyããå¤æ´ãã¦ãªãã®ã§ãå
ã»ã©ã¨åããããªçµæã«ãªãã¾ããã
create_studyã¡ã½ããã®å¼æ°ã®load_if_existsã®èª¬æã追å
ä»åã¯å¼æ°ã®load_if_exists
ãæå®ãã¦ãã¾ããã§ããããã®å¼æ°ã¯ããã¼ã¿ãã¼ã¹ãã¡ã¤ã«ã使ãå ´åã¯å¿
è¦ã«ãªãå ´åãããã®ã§ãããã§èª¬æãã¾ãã
load_if_exists
ã®ããã©ã«ãã¯Falseã§ãã
load_if_exists=False
ã§create_studyãå®è¡ããå ´åãã¹ãã¬ã¼ã¸ã«æ¢ã«åãååã®ã¹ã¿ãã£ãåå¨ããå ´åã以ä¸ã®ããã«ã¨ã©ã¼ãçºçããã¹ãã¬ã¼ã¸ã«ä¿åããã¹ã¿ãã£ãå£ããªãããã«ãã¦ããã¾ãã
DuplicatedStudyError: Another study with name 'ch2-rf-seed' already exists. Please specify a different name, or reuse the existing one by setting `load_if_exists` (for Python API) or `--skip-if-exists` flag (for CLI).
ä¸æ¹ãload_if_exists=True
ã§create_studyãå®è¡ããoptimizeãå®è¡ããå ´åãæå®ããã¹ã¿ãã£ã®ç¶ããããã©ã¤ã¢ã«ãå®è¡ãã¦ããã¾ãï¼æåã«optimizeã§n_trialsã100ã§å®è¡ãã¦ããå ´åã101åç®ããç¶ããå®è¡ãã¦ããã¾ãï¼ã
以ä¸ãå®è¡
study = optuna.create_study(
study_name="ch2-rf-seed",
storage="sqlite:///optuna.db",
direction="maximize",
load_if_exists=True)
study.optimize(objective, n_trials=100)
print(f"Best objective value: {study.best_value}")
print(f"Best parameter: {study.best_params}")
å®è¡çµæ
[I 2024-03-20 06:22:48,549] Using an existing study with name 'ch2-rf-seed' instead of creating a new one.
[I 2024-03-20 06:22:58,802] Trial 100 finished with value: 0.8464231581699796 and parameters: {'max_depth': 14, 'min_samples_split': 0.021362272594642306}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:23:10,816] Trial 101 finished with value: 0.8517054858265473 and parameters: {'max_depth': 18, 'min_samples_split': 0.009011066100582318}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:23:19,036] Trial 102 finished with value: 0.8403218397552283 and parameters: {'max_depth': 19, 'min_samples_split': 0.04818164005763597}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:23:29,129] Trial 103 finished with value: 0.8475492541136544 and parameters: {'max_depth': 20, 'min_samples_split': 0.02326090335375395}. Best is trial 72 with value: 0.8554523013892831.
[I 2024-03-20 06:23:42,828] Trial 104 finished with value: 0.8543467093909327 and parameters: {'max_depth': 18, 'min_samples_split': 0.0016971833803444691}. Best is trial 72 with value: 0.8554523013892831.
ã»ã»ã»éä¸å²æã»ã»ã»
[I 2024-03-20 06:39:40,899] Trial 195 finished with value: 0.8425330476463699 and parameters: {'max_depth': 28, 'min_samples_split': 0.03683044981939044}. Best is trial 178 with value: 0.8559231929049508.
[I 2024-03-20 06:39:51,507] Trial 196 finished with value: 0.8482044296168122 and parameters: {'max_depth': 27, 'min_samples_split': 0.019320595313816837}. Best is trial 178 with value: 0.8559231929049508.
[I 2024-03-20 06:40:08,113] Trial 197 finished with value: 0.8553703824433508 and parameters: {'max_depth': 25, 'min_samples_split': 0.000783018066616531}. Best is trial 178 with value: 0.8559231929049508.
[I 2024-03-20 06:40:15,712] Trial 198 finished with value: 0.838929610046249 and parameters: {'max_depth': 25, 'min_samples_split': 0.05273723875182029}. Best is trial 178 with value: 0.8559231929049508.
[I 2024-03-20 06:40:25,736] Trial 199 finished with value: 0.8434134320259252 and parameters: {'max_depth': 26, 'min_samples_split': 0.034967023097725466}. Best is trial 178 with value: 0.8559231929049508.
Best objective value: 0.8559231929049508
Best parameter: {'max_depth': 24, 'min_samples_split': 0.0006325520315008212}
追å ã§ããã«100åãã©ã¤ã¢ã«ãå®è¡ãããã精度ã®æ¹åã¯ãããã§ãããæ¢ç´¢ç©ºéã2ã¤ã®å¼æ°ã®ã¿ã§ãããçæ¹ãé¢æ£å¤ãªã®ã§ãé¸æè¢ãå°ãªããã¨ãåå ã ã¨æããã¾ãã
list_2_19_load_study.pyãå®è¡ãã
list_2_19_load_study.pyã®ã½ã¼ã¹ã³ã¼ãã¯ä»¥ä¸ã®éãã§ãã
説æããããããã«ãä¸é¨ã³ã¡ã³ãã追å ãã¦ãã¾ããã¾ããå¦ç¿ãåç¾ããããã«ãã¹ããã©ã¤ã¢ã«ã®çªå·ã®è¡¨ç¤ºã追å ãã¦ãã¾ãã
å¦çã®å
容ã¯ãã¹ãã¬ã¼ã¸ã¨ã¹ã¿ãã£åãæå®ãã¦ãã¹ã¿ãã£ããã¼ã¿ãã¼ã¹ãããã¼ããã¦ãã¾ãã
import optuna
study = optuna.load_study(
study_name="ch2-rf-seed",
storage="sqlite:///optuna.db",
)
print(f"Best objective value: {study.best_value}")
print(f"Best parameter: {study.best_params}")
print(f"Best number: {study.best_trial.number}")
ã¹ã¿ãã£ã®ãã¼ã
Optunaã®ããã¥ã¢ã«ã®load_studyï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.load_study.htmlï¼ãè¦ã¾ããcreate_studyã¨åãããå¼æ°ã«study_name="ch2-rf-seed"
ã§ã¹ã¿ãã£åããstorage="sqlite:///optuna.db"
ã§ãã¼ã¿ãã¼ã¹ãã¡ã¤ã«ãã¹ãæå®ãã¦ãã¾ãã
å®è¡çµæ
Best objective value: 0.8559231929049508
Best parameter: {'max_depth': 24, 'min_samples_split': 0.0006325520315008212}
Best number: 178
ã¹ã¿ãã£ãå®è¡ããçµæã¨åãå
容ãèªã¿åºããã¨ãã§ãã¦ãã¾ãã
Optunaã§æ¢ç´¢ãããã¹ããã©ã¡ã¼ã¿ã使ã£ã¦ãå¦ç¿ãåç¾ãã
ã¾ããã½ã¼ã¹ã³ã¼ãã示ãã¾ãã
tr = study.trials[study.best_trial.number]
set_random_seed( 0 )
clf = RandomForestClassifier(
max_depth=tr.params['max_depth'],
min_samples_split=tr.params['min_samples_split'],
)
score = cross_val_score(clf, X, y, cv=3)
accuracy = score.mean()
print(f"Accuracy: {accuracy}")
ããã§ã¯ãã©ã®ãã©ã¤ã¢ã«ãåç
§ããå ´åã«ã§ã使ããæ¹æ³ã¨ãã¦ãbest_trial.number
ã§å¯¾è±¡ã®ãã©ã¤ã¢ã«ãåç
§ãã¦ããã®ãã©ã¤ã¢ã«ã®params
ã使ã£ã¦ãã¾ãããåç´ã«ãã¹ããã©ã¡ã¼ã¿ãæå®ãããã ããªããstudy.best_params
ã§åç
§ã§ãã¾ãã
Optunaã®ããã¥ã¢ã«ã®Studyã¯ã©ã¹ï¼https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.htmlï¼ãè¦ã¾ãã
Attributesã«ããlist_2_19_load_study.pyãå®è¡ãããã§ã使ç¨ãããbest_trial
ãããããã¹ããã©ã¡ã¼ã¿ã®ãã©ã¤ã¢ã«ãåç
§ã§ãã¾ãã以ä¸ã¯ãbest_trial
ãåç
§ããçµæã§ãã
FrozenTrial(number=178, state=TrialState.COMPLETE, values=[0.8559231929049508], datetime_start=datetime.datetime(2024, 3, 20, 6, 36, 4, 44871), datetime_complete=datetime.datetime(2024, 3, 20, 6, 36, 20, 969838), params={'max_depth': 24, 'min_samples_split': 0.0006325520315008212}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'max_depth': IntDistribution(high=32, log=False, low=2, step=1), 'min_samples_split': FloatDistribution(high=1.0, log=False, low=0.0, step=None)}, trial_id=389, value=None)
ãããè¦ãã¨ãbest_trial
ã®number
ãåç
§ããã¨ããã©ã¤ã¢ã«çªå·ãåç
§ã§ãããã¨ãåããã¾ãã
ã¾ããtrials
ããããã¹ã¿ãã£ã®å
¨ãã©ã¤ã¢ã«ããªã¹ãã§è¿ãã¦ããã¾ããbest_trial
ã§åç
§ãããã©ã¤ã¢ã«ã§ãnumber
ã§ããã©ã¤ã¢ã«çªå·ãåç
§ãããã¨ã§ããã¹ããã©ã¡ã¼ã¿ãåç
§ãããã¨ãã§ãã¾ãã
å®è¡çµæ
Accuracy: 0.8559231929049508
ãã¹ããã©ã¡ã¼ã¿ã®å¦ç¿ãåç¾ã§ãã¦ãããã¨ãåããã¾ãã
ä»åã¯ä»¥ä¸ã§ãããç²ãæ§ã§ããï¼