Hadoopæ¬2çãè²·ã£ããã®ä¸é±éå¾ã«Deals of the day ã§åé¡ã»ã¼ã«ãããã¦æ»ã«ãããªã£ãã®ã§è
¹ããã«æ¸ã
ã¯ããã«
- ããã«æ¸ãã¦ããã®ã¯å ¨é¨åèãªã³ã¯ã»æç®ããã²ã£ã±ã£ã¦ããã ãã§ãã»ã¨ãã©å ¨é¨æ¤è¨¼ãã¦ãªããééããããã°ãªãã¹ãæ©ãã«æ´æ°ããããåªåã¯ããããéµåã¿ã«ãã¦ä½ãèµ·ãã¦ãèªå·±è²¬ä»»ã§ã
- Hive ã®ã¯ã¨ãªãã¥ã¼ãã³ã°ã«é¢ããã¡ã¢æ¸ãã§ããã以ä¸ã®ãã¨ã¯ãæ¸ãã¦ããªãã
- Hadoopèªä½ã®ãã¥ã¼ãã³ã°
- Hive ã®ã¯ã¨ãªãã¥ã¼ãã³ã°ä»¥å¤ã®è©±
- ä¾ãã°ãå§ç¸®ãã¡ã¤ã«ã Hive ä¸ã§æ±ãã«ã¯ã©ããããã¨ã
JOIN
ä¸çªå·¦ã®ãã¼ãã«ã«æã大ããªãã¼ãã«ãæã£ã¦ãã
- ä¸çªå·¦ã®ãã¼ãã«ãMRã§ããå ¥åãã¼ã¿ã¨ãã¦æµãããã¤ã³ãã¼ãã¼ãã«ã®ãã¼ã¿ã¯ã¡ã¢ãªã«ä¿æãããã
åä¸ JOIN ãã¼
- é常㯠1 JOIN = 1 MR ã¸ã§ãã ããåä¸ã® JOIN ãã¼ã使ã£ã¦ããéãã©ãã ããã¼ãã«ã¤ãªãã§ã 1 MR ã¸ã§ãã
- OUTER JOIN ã§ãåãããã«ãªãã
Map Join
- çæ¹ã®ãã¼ãã«ãããã·ã¥ãã¼ãã«ä¸ã«èªã¿è¾¼ãã§ãã㦠Map ã ã㧠JOIN ãã¦ãã¾ã
- ã³ã¡ã³ãã¿ãããªæ§æã ãããã§ããããã
- å½ç¶ã¡ã¢ãªã«è¼ãã ããã£ãã·ã¥ã§ãã
INSERT INTO TABLE pv_users
SELECT /*+ MAPJOIN(pv) */ pv.pageid, u.age
FROM page_view pv JOIN user u
ON (pv.userid = u.userid);
hive.join.emit.interval | 1000 |
hive.mapjoin.size.key | 10000 |
hive.mapjoin.cache.numrows | 10000 |
GROUP BY
Map ãµã¤ãé¨åéç´ã¨ãã¼ã¿ã®ãã¼ããã©ã³ã·ã³ã°
ååã§ãªãã¨ãªãæ³åã¯ã¤ããããã©ã¡ã¼ã¿ã®ç´°ããæå³ã¯ä¸æ
hive.map.aggr | true |
hive.groupby.skewindata | false |
hive.groupby.mapaggr.checkinterval | 100000 |
hive.map.aggr.hash.percentmemory | 0.5 |
hive.map.aggr.hash.min.reduction | 0.5 |
ãã«ã GROUP BY
- åä¸ãã¼ãã«ãã GROUP BY ã2åå©ã㨠MR ã4åå®è¡ãããã¨ã«ãªãããããããã°æåã®1ååãæ¸ã3åã®MRã§æ¸ãã
- nåGROUP BY ã ã£ãã2nåã®MRã®ã¨ãããn+1åã§æ¸ãã¨ãããã¨ã
- å¯è½ãªéãå©ç¨ãããã¨ãã¨æ¸ãã¦ãã
FROM pv_users
INSERT OVERWRITE TABLE pv_gender_sum
SELECT gender, count(DISTINCT userid), count(userid)
GROUP BY gender
INSERT OVERWRITE TABLE pv_age_sum
SELECT age, count(DISTINCT userid)
GROUP BY age
ã½ã¼ã
ORDER BY
- Reducer ã 1ã¤ããåããªããã¨ã«æ³¨æã
- è¦ããã«ä½ãèããã«ãã©ã½ã¼ãã¿ãããªãã¨ãããã¨æ»ã¬ã
SORT BY
- ç°ãªã Reduce ã¿ã¹ã¯ãã¨ã«ã½ã¼ãã
- å ¨ä½ã®ã½ã¼ãããã¦ããããããªããã¨ã«æ³¨æã
DISTRIBUTE BY
- ããã§æå®ããã«ã©ã ãåãå¤ã®è¡ã¯åã reducer ã«è¡ããã¨ãä¿è¨¼ããã
CLUSTER BY
- SORT BY 㨠DISTRIBUTED BY ã§åä¸ã«ã©ã ãæå®ããã®ã¨åãã
ãã®ä»
å°ãããã¡ã¤ã«ã®ãã¼ã¸
hive.merge.mapfiles | true |
hive.merge.mapredfiles | false |
hive.merge.size.per.task | 256*1000*1000 |
åèãªã³ã¯ã»æç®
- 2009å¹´ã®Hive User Meeting ã§çºè¡¨ããã Facebook ã®è³æ
- Hadoop Definitive Guide 2nd Edition
- ä½è : Tom White,Doug Cutting
- åºç社/ã¡ã¼ã«ã¼: Oreilly & Associates Inc
- çºå£²æ¥: 2010/10/14
- ã¡ãã£ã¢: ãã¼ãã¼ããã¯
- è³¼å ¥: 2人 ã¯ãªãã¯: 149å
- ãã®ååãå«ãããã° (14件) ãè¦ã