ãã¦ã¯ã¾ãã®ãã確çåå¸ãæ¢ããã
ã¯ããã«
ãã¼ã¿ãçºãã¦ããã¨ãããåå¸ã«å¯¾ãã¦ãããæ£è¦åå¸ã«å¾ãã®ãã対æ°æ£è¦åå¸ããããã¨ãã¬ã³ãåå¸ã®æ¹ãè¿ãã®ãï¼ãã¨ããããã«ã©ã®åå¸ã®å½ã¦ã¯ã¾ããããããæ°ã«ãªããã¨ãããã¨æãã¾ãã
ããã確èªããæ¹æ³ãæ¢ãã¦ã¿ãã¨ãããscipy.statsã使ãã°ã§ãããã ã£ãã®ã¨ãfitterã¨ããã©ã¤ãã©ãªããã£ãã®ã§ããããã試ãã¦ã¿ãçµæãè¨è¿°ãã¾ãã
å®é¨
scipyã使ã
å®è£ ã¯numpy - Fitting empirical distribution to theoretical ones with Scipy (Python)? - Stack Overflowãå°ãã ãä¿®æ£ãããã®ã§ããå ¥åã«å¯¾ãã¦scipy.statsã«ç»é²ããã¦ãããã¹ã¦ã®ç¢ºçåå¸ã®ãã©ã¡ã¼ã¿ãæå°¤æ¨å®ããçµæã®å¹³åäºä¹èª¤å·®ãæ¯è¼ãããã¨ã§æããã¦ã¯ã¾ãã®ããåå¸ãæ±ãã¾ãã
scipyã«ã¯80以ä¸ã®ç¢ºçåå¸ãåå¨ãã¦ããã®ã§ãããå ¨é¨åãã¦ãã¾ãã¨100è¡ç¨åº¦ã®ãã¼ã¿ã§ãæ°åããã£ã¦ãã¾ãã¾ãããç¥ããªãåå¸ãé¸ã°ããã¨ããã§ä»æ¹ããªãã®ã§ãããç¨åº¦é¸æãã¦ããã®ãç¾å®çããªã¨æãã¾ããä»åã¯æ£è¦åå¸ãä¸æ§åå¸ãã¬ã³ãåå¸ãã¬ã¤ãªã¼åå¸ã®4ã¤ã®åå¸ã§æ¯è¼ãè¡ãã¾ãã
from sklearn.datasets import load_iris import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt #ãµã³ãã«ãã¼ã¿ã®èªã¿è¾¼ã¿ iris = load_iris() data=pd.DataFrame(iris.data, columns=iris.feature_names) #使ç¨ããåå¸ #æ£è¦åå¸ãä¸æ§åå¸ãã¬ã³ãåå¸ãã¬ã¤ãªã¼åå¸ distributions=[st.norm,st.uniform,st.gamma,st.rayleigh] result=[] for distribution in distributions: y, x = np.histogram(data.iloc[:,1], bins=20, density=True) #xã®å¤ã¯ãã¹ãã°ã©ã ã®å·¦ç«¯ã®å¤ãªã®ã§ä¸å¿ç¹ã«ä¿®æ£ x = (x + np.roll(x, -1))[:-1] / 2.0 params = distribution.fit(data.iloc[:,1]) # Separate parts of parameters arg = params[:-2] loc = params[-2] scale = params[-1] # Calculate fitted PDF and error with fit in distribution pdf = distribution.pdf(x, loc=loc, scale=scale, *arg) sse = np.sum(np.power(y - pdf, 2.0)) result.append((pdf,sse)) df=pd.DataFrame() df["name"]=[i.name for i in distributions] df["sumsquare_error"]=[i[1] for i in result] df
ä»åã®åå¸ã®ä¸ã§ã¯æ£è¦åå¸ã®ãã¦ã¯ã¾ããæãè¯ãããã§ãã
for i in range(len(result)): pd.Series(result[i][0], x).plot(label=distributions[i].name) plt.hist(data.iloc[:,1],density=True,alpha=0.4,bins=20) plt.legend() plt.show()
æ£è¦åå¸ãã¬ã³ãåå¸ã¯å ã®åå¸ã«å¯¾ãã¦ãã¦ã¯ã¾ããæ¯è¼çè¯ããã¨ã確èªã§ãã¾ãã
fitterã使ã
ä¸ã«æ¸ãããã®ãã©ã¤ãã©ãªåãããããªæãã§ãã
from fitter import Fitter f = Fitter(data.iloc[:,1],distributions=['gamma', 'rayleigh', 'uniform','norm'],bins=20) f.fit() f.summary()
ä¸ã§ãã£ã¦ãããã¨ã¯åããªã®ã§çµæã¯å ã»ã©ã¨åãã«ãªãã¾ãã
æ°ãã¤ããããã¨ã¨ãã¦ã¯ããã¦ã¯ã¾ãã®è¯ãåå¸ã¯binã®æ°ã«ãã£ã¦å¤åãããã¨ããªã¨æãã¾ãã fitterã®binã®ããã©ã«ãå¤ã¯100ãªã®ã§ãããirisãã¼ã¿ã¯ãµã³ãã«ãµã¤ãºã150ã¨å°ããããã©ã«ãã®binãç¨ãã㨠çµæããããããªããããä»åã®å®é¨ã§ã¯binã20ã¨ãã¦ãã¾ãã
ã¾ã¨ã
ä»»æã®ãã¼ã¿ã«å¯¾ããå½ã¦ã¯ã¾ãã®è¯ã確çåå¸ãå°åºããææ³ã®èª¿æ»ãè¡ãã¾ããã 確çåå¸ã®ä»®å®ãå¿ è¦ãªçµ±è¨ã¢ããªã³ã°ãè¡ãéãã åå¸ãããã¼ã¿çæã®ä»çµã¿ãªã©ãæ¨æ¸¬ããã®ã«å½¹ç«ã¤ããããã¾ããã