ã¶ã£ã¡ããä»æ´æããªãããªããã§ãããå®ã¯ããã¾ã§èªåã§ã¯ã»ã¨ãã©ç°å¸¸æ¤ç¥ã»å¤åæ¤ç¥ãã´ãªã´ãªãã£ããã¨ããªãã£ããªãã¨æã£ãã®ã§ããããã£ããã¯ãæã è²ã ãªææ³ã®ãã¹ãã«ä½¿ã£ã¦ãããã®UCIæ©æ¢°å¦ç¿ãªãã¸ããªã®ãã¼ã¿ã»ããã
ããã¯èª¬æã«ãããããã«ãã¨ããé½å¸é¨ã®ä¸æ°´å¦çå ´ã®æ§ã ãªã»ã³ãµããã®ã¤ã³ããããåããã¦æ¥æ¬¡ã§ã¾ã¨ãããã¼ã¿ã»ããã§ãããã®æã®ãã©ã³ããã¼ã¿ã»ããã«ãããã¡ãªèª²é¡ãã¾ãã«ãã®ç°å¸¸æ¤ç¥ã§ãè¦ã¯ä½ãä¸å ·åããã£ãæ¥ä»ãäºå¾ã§è¯ãã®ã§æ¤åºãããã¨ããã話ã§ãã
ç°å¸¸æ¤ç¥èªä½ã¯ã以åãã®ããã°ã§ããã£ã¨ã ãåãä¸ãããã¨ãããã¾ããããã¯{AnomalyDetection}ããã±ã¼ã¸ã®ç´¹ä»è¨äºã
ãã®æã¯{AnomalyDetection}ãä¾æ ããgeneralized ESD testã®è©±ãã¡ãã£ã¨ç´¹ä»ããã ãã§ãæ ¹åºã«ããçè«ã¨ããã®ä»ã®é¡ä¼¼ææ³ãªã©ã«ã¤ãã¦ã¯ç¹ã«è§¦ãããã¾ãã§ããã
ã¾ãããã£ã¨ä»¥åã«ã¯{MSwM}ããã±ã¼ã¸ãç¨ãã¦ãã«ã³ãç¶æ 転æã¢ãã«ãç°å¸¸æ¤ç¥ã«ä½¿ã話é¡ãåãä¸ãããã¨ãããã¾ãã
ããã¯ããã§ãã¾ããã£ã¦ããããã«è¦ãã¾ãããããã ã¨åå¤éæç³»åã«ãã使ããªããªãã¨ããå°è±¡ãæã£ãã®ãäºå®ã§ããããããããã®æ¹æ³ãã®ã¾ã¾ã§ã¯å¤å¤éã®ç°å¸¸æ¤ç¥ã¯ã§ããªãã¨ããããã§ããããã§è¥å¹²æ¶åä¸è¯æãè¦ããã¾ã¾ããããæ¾ã£ã¦ãããã®ã§ããã
ãªã®ã§ãããä¸è¨ã®Water Treatment Plant Datasetããã¾ãã¾ä½åº¦ã触ãããã«ãªã£ã¦ãããã§ã¯ããããã¨æã£ã次第ã§ãè²·ã£ã¦ããã®ããã¡ãã®æ¸ç±ã
å ¥é æ©æ¢°å¦ç¿ã«ããç°å¸¸æ¤ç¥âRã«ããå®è·µã¬ã¤ã
- ä½è : äºæå
- åºç社/ã¡ã¼ã«ã¼: ã³ãã社
- çºå£²æ¥: 2015/02/19
- ã¡ãã£ã¢: åè¡æ¬
- ãã®ååãå«ãããã° (4件) ãè¦ã
æ©æ¢°å¦ç¿çéã®çãããªããåç¥ã®ã@Idesanããã®æã«ããç°å¸¸æ¤ç¥æ¬ã§ã*1ã次åããã¯ãã¡ãã®æ¸ç±ãè¸ã¾ãã¦åå¼·ãã¦ããã¨ãã¦ãä»åã¯ãä»ç¾å¨ã®èªåã¯ã©ããã£ã¦ç°å¸¸æ¤ç¥ããã£ã¦ããã®ããã¨ããç¾ç¶ææ¡ã®ã¾ã¨ããæ¸ãã¦ãããã¨æãã¾ãã
Water Treatment Plant Datasetã«ã¤ãã¦
æ¹ãã¦ãã¼ã¿ã»ããã«ã¤ãã¦åæ²ãã¦ããã¾ãããã
Data Set Information:
This dataset comes from the daily measures of sensors in a urban waste water treatment plant. The objective is to classify the operational state of the plant in order to predict faults through the state variables of the plant at each of the stages of the treatment process. This domain has been stated as an ill-structured domain.
Attribute Information:
All atrributes are numeric and continuous
N. Attrib.
1 Q-E (input flow to plant)
2 ZN-E (input Zinc to plant)
3 PH-E (input pH to plant)
4 DBO-E (input Biological demand of oxygen to plant)
5 DQO-E (input chemical demand of oxygen to plant)
6 SS-E (input suspended solids to plant)
7 SSV-E (input volatile supended solids to plant)
8 SED-E (input sediments to plant)
9 COND-E (input conductivity to plant)
10 PH-P (input pH to primary settler)
11 DBO-P (input Biological demand of oxygen to primary settler)
12 SS-P (input suspended solids to primary settler)
13 SSV-P (input volatile supended solids to primary settler)
14 SED-P (input sediments to primary settler)
15 COND-P (input conductivity to primary settler)
16 PH-D (input pH to secondary settler)
17 DBO-D (input Biological demand of oxygen to secondary settler)
18 DQO-D (input chemical demand of oxygen to secondary settler)
19 SS-D (input suspended solids to secondary settler)
20 SSV-D (input volatile supended solids to secondary settler)
21 SED-D (input sediments to secondary settler)
22 COND-D (input conductivity to secondary settler)
23 PH-S (output pH)
24 DBO-S (output Biological demand of oxygen)
25 DQO-S (output chemical demand of oxygen)
26 SS-S (output suspended solids)
27 SSV-S (output volatile supended solids)
28 SED-S (output sediments)
29 COND-S (output conductivity)
30 RD-DBO-P (performance input Biological demand of oxygen in primary settler)
31 RD-SS-P (performance input suspended solids to primary settler)
32 RD-SED-P (performance input sediments to primary settler)
33 RD-DBO-S (performance input Biological demand of oxygen to secondary settler)
34 RD-DQO-S (performance input chemical demand of oxygen to secondary settler)
35 RD-DBO-G (global performance input Biological demand of oxygen)
36 RD-DQO-G (global performance input chemical demand of oxygen)
37 RD-SS-G (global performance input suspended solids)
38 RD-SED-G (global performance input sediments)
ããã®å
é åã«æ¥ä»ãå
¥ã£ããã¼ã¿ã»ãããªãã§ãããç¹å¾´éãªã¹ããè¦ãã°åããããã«ããã¯åºæ¬çã«ã¯å¦ç¿ã©ãã«ãªãã®æ師ãªãå¦ç¿åãã®ãã¼ã¿ã»ããã ã¨ãè¨ãã¾ã*2ããªãã§ãããçµæ§æ¬ æå¤ãå¤ãã®ã§ãã®ã¾ã¾æ±ãã¨ã¡ãã£ã¨é¢åã§ããä»åã¯æ¬ æå¤è£å®ã®åå¼·ãããããã§ã¯ãªãã®ã§ãå
ã«NAè¡ãæãããã¼ã¿ã»ãããæå
ã§ä½ã£ã¦ããã¾ããã®ã§ä¸ã«ç½®ãã¦ããã¾ãã
以ä¸ãã®ãã¼ã¿ã»ããã使ã£ã¦ãã£ã¦ããã¾ãã
Wardæ³ã§è¦å½ãã¤ãã¦K-meansã§ããã¡ãåºã
ä»ç¾å¨åããã®ãã¼ã¿ã»ããã«å¯¾ãã¦åºæ¥ããã¨ã¯ãã¯ã©ã¹ã¿ãªã³ã°ã§ã¨ã«ãããµã³ãã«ãµã¤ãºã®æãå°ããªã¯ã©ã¹ã¿ãæ¢ãåºããã¨ããããæ¹ã§ããè¨ãæããã¨ãããã¯ãæããµã³ãã«ãµã¤ãºã®å°ããªã¯ã©ã¹ã¿ãããå¤ãå¤ã¨ã¿ãªãããã¨ããç´ æ´ãªç°å¸¸æ¤ç¥ã®èãæ¹ã§ãã
ã¨ã¯è¨ãããããªãK-meansã¿ãããªæ¹æ³ã§ãã£ã¦ãããåãããªãçµæã«ãªããããªæ°ãããã®ã§ãã¾ãã¯Wardæ³ã§ã¯ã©ã¹ã¿ãªã³ã°ãã¦ãå¯è¦åãããã¨ã§è¦å½ãã¤ãã¦ã¿ã¾ãã
> d <- read.csv('watertreatment_mod.csv') > d.dist <- dist(d[,-1]) > d.hcl <- hclust(d.dist, method='ward.D2') > plot(d.hcl, labels=d[,1])
ä½ã¨ãªããµã³ãã«ãµã¤ãºãå°ããã¦ãå°ä¸ã¤å¤ç«ããããã«è¦ããã¯ã©ã¹ã¿ããã©ãã©è¦ãã¾ãããããK-meansã§ãã·ãã¨ç¹å®ã§ããã°è¯ãã®ããªã¨æãããã®ã§ãã¨ããããK = 4, ..., 10ã§é次試ãã¦ã¿ã¾ãã
> for (i in 4:10){ + km <- kmeans(d[,-1], centers=i) + print(table(km$cluster)) + } 1 2 3 4 176 61 39 104 1 2 3 4 5 16 70 156 88 50 1 2 3 4 5 6 31 16 119 75 87 52 1 2 3 4 5 6 7 4 40 130 48 65 81 12 1 2 3 4 5 6 7 8 84 64 16 3 40 63 67 43 1 2 3 4 5 6 7 8 9 7 45 3 65 53 44 64 16 83 1 2 3 4 5 6 7 8 9 10 53 27 54 3 68 33 32 45 50 15
K > 7ã«ãªãã¨ä½æ
ã3ãµã³ãã«ããåé¡ãããªãã¯ã©ã¹ã¿ãé£ç¶ãã¦åºã¦ããã®ãè¦ã¦åãã¾ããããããç¹å®ããã¨ã
> for (i in 8:10){ + km <- kmeans(d[,-1], centers=i) + cls <- which(table(km$cluster)==3) + print(d$date[km$cluster==cls]) + } [1] D-16/9/90 D-2/8/90 D-11/8/91 527 Levels: D-1/1/90 D-1/1/91 D-1/10/90 D-1/10/91 D-1/11/90 D-1/2/90 D-1/2/91 D-1/3/90 D-1/3/91 D-1/4/90 ... D-9/9/90 [1] D-16/9/90 D-2/8/90 D-11/8/91 527 Levels: D-1/1/90 D-1/1/91 D-1/10/90 D-1/10/91 D-1/11/90 D-1/2/90 D-1/2/91 D-1/3/90 D-1/3/91 D-1/4/90 ... D-9/9/90 [1] D-16/9/90 D-2/8/90 D-11/8/91 527 Levels: D-1/1/90 D-1/1/91 D-1/10/90 D-1/10/91 D-1/11/90 D-1/2/90 D-1/2/91 D-1/3/90 D-1/3/91 D-1/4/90 ... D-9/9/90
90å¹´8æ2æ¥ã90å¹´9æ16æ¥ã91å¹´8æ11æ¥ã®3ã¤ã®æ¥ä»ã«ç°å¸¸ããã£ãããããã¨ãããã¨ãåããã¾ãããåã®ç¾å¨ã®ã¹ãã«ã»ããã§åºæ¥ãã®ã¯å¤§ä½ããã¾ã§ã§ãã
æå¾ã«
ä»åã®è¨äºã§ã¯ãç¾ç¶ææ¡ã¨ãã¦ä»ç¾å¨èªåã«åºæ¥ãç¯å²ã§ã¯ãããªæãã®ã¢ã¦ããããã«ãªãã¨ãããã¼ã¹ã©ã¤ã³ãåºãã¦ããã¾ããã次å以éã®è¨äºã§ã¯ãå®éã«@Idesanããæ¬ã追ããªããã©ããªç°å¸¸æ¤ç¥ã®ææ³ãããããæãåãããªããå¦ãã§ãããã¨æãã¾ãã