SlideShare a Scribd company logo
1ZUIPO
84.3 
顆罏ךؚٗ꧊鎘 
1Z$PO+1
 
LJSB$IJLV
BDIJLV	
 
 
/BNFLJSB$IJLV 
5XJUUFS!@BDIJLV 
(JU)VC!BDIJLV 
 
馯㄂拦莸	LJSB$IJLV'JSFד嗚稊
 
耵噟ؒٝآص،!LBONV
(PBM 
 
涺ׁ׿ך鑧׾耀ֹ׋ְ	荈ⴓָ
 
Ø չז׈׉ך圓䧭זךַպח搊挿׾縧ְ׋✲⢽ךⰟ剣 
Ø չⰅꟌ⟃♳պ׾湡䭷׃׋1ZUIPO
.3崞欽倯岀ךⰟ剣
,BONV	#VTJOFTT
 
 
Ø ؕ٦س⠓爡ה⼿噟׃׋寸幥ر٦ةⴓ匿 
Ø ؕ٦سח秡בֻؙ٦هٝךꂁ⥋ 
Ø $BSE-JOLFE0FS	$-0
$BSE-JOLFE0FS
$BSE-JOLFE0FS	
 
չ䏄ךؙ٦هٝؒٝزٔ٦׃ 
׋ؕ٦سד顠ְ暟ׅ׸לه 
 
؎ٝز؜حزկպ 
ؕ٦س 
⠓爡 
,BONV 
չְֲֲֶֿ㹏ׁ׿ח 
ְֲֲֿؙ٦هٝ⳿׃׋ְպ 
 
չְֲֲֿ飑顠⫘ぢךֶ㹏ׁ׿ך倯 
ְְָךדכպ 
 
չֿ׿ז穠卓׌׏׋ךדծ如㔐כֿ 
ְֲֲإًؚٝزⴖ׶ת׃׳ֲպ 
ؕ٦س 
⠓㆞ 
ֶ䏄
2VJDL4VSWFZ 
 
Ø ؚٗⴓ匿חꟼ׻׏גְ׵׏׃ׯ׷倯 
Ø )BEPPQ⢪׏ג׵׏׃ׯ׷倯 
Ø )JWF⢪׏ג׵׏׃ׯ׷倯 
Ø .3⢪׏ג׵׏׃ׯ׷倯
չז׈׉ך圓䧭זךַպ 
ח搊挿׾䔲ג׋✲⢽ךⰟ剣
䒦爡ך⵸䲿
ֿךز٦ؙךة؎زٕ
顆罏ךؚٗ꧊鎘
1PPSNBOˏT  
Ø ➙֮׷植朐׾⯋ח 
Ø 満ٔا٦أ	➂儗꟦穗꿀
ד湡涸׾麦䧭ׅ׷㪦⹲ 
Ø 湡涸׾麦䧭ׅ׷أؾ٦س׾〳腉זꣲ׶♳־׷㪦⹲ 
Ø 搀欽ז佄⳿׾鼘ֽ׷㪦⹲
,BONV	OHJOFFS5FBN
 
 
NBLJ 
	$0OHJOFFS
 
@JEFZVUB 
	%FTJHOFS
 
NPRBEB 
	OHJOFFS
 
@BDIJLV 
	OHJOFFS
 
爡ꞿ噟灇瑔Ꟛ涪 
رؠ؎ٝؿٗٝز 
أوم،فؚٔٗⴓ匿 
ؿٗٝزغحؙؒٝس 
؎ٝؿٓأوم،فٔ 
غحؙؒٝس؎ٝؿٓ 
ⴓ匿㛇湍ؚٗⴓ匿㼎ػ٦زش٦璞〡
3FRVJSFNFOUT 
Ø ֮׷玎䏝ךꆀחז׷ر٦ة׾أزٖأ搀ֻ꧊鎘׃׋ְ 
• WF(EBZ
.BY(EBZ	ꬊ㖇簭
 
• (#
剢	ꬊ㖇簭
 
• ؟٦ؽأך䧭ꞿהⰟח㟓ִ׷鋅鴥׫ 
• 剢⽃⡘ד،سمحؙזؙؒٔ׮䫎־׋ְ 
Ø 爡ⰻח㣐鋉垷ر٦ة׾Ⳣ椚ׅ׷濼鋅׾顕׭׋ְ 
• չل٦أꂁⴓ׾׃׋♳דպ濼鋅׾顕׭׷ 
• 㢩鿇ח⳿׃חְֻإٝءذ؍ـזر٦ة׮㶷㖈 
Ø 麊欽؝أزⴱ劍䫎项׾⡚ֻ䫇ִ׋ְ
/PU3FRVJSFNFOUT 
Ø ⴓ匿ָٔ،ٕة؎يד֮׷䗳銲䚍כ植朐넝ֻזְ 
Ø ،سمحؙⴓ匿㛇湍ך؟٦ؽأٖكٕכ寸׃ג넝ֻזְ 
• 兛鸐ךغحثⳢ椚כ衅׍גכ꼽湡׌ֽו 
• 䌢חⵃ欽〳腉ז朐䡾חז׏גְזֻג׮葺ְ 
• ⵃ欽כ爡ⰻחꣲ㹀ׁ׸גְ׷ 
Ø ׋׌׃ծ♳鎸ָ3FRVJSFNFOUTחז׷〳腉䚍כ⼧ⴓ剣׷
NB[POMBTUJD.BQ3FEVDF
84.3 
Ø 侧֮׷84؟٦ؽأךֲ׍ך♧א 
Ø )BEPPQװ)BEPPQؒ؝ءأذيⰻך48ָر 
ؿٕؓزדⵃ欽〳腉 
Ø 1*ד饯⹛ծ+PCך㹋遤ծ⨡姺׾乼⡲〳腉 
Ø ٌصةؚٔٝ瘝׮״׃זח㹋倵׃גֻ׸׷ 
Ø 4׾)%'4ך剏׶חⵃ欽〳腉 
Ø ؙٓأةך〴侧㢌刿ָ㺁僒
SDIJUFDUVSF 
盖椚؟٦غ 
ؙ٦هٝ 
ꂁ⥋؟٦غ 
ؙ٦هٝ 
ꂁ⥋؟٦غ 
• ꂁ⥋؟٦غ♳ך'MVFOUEדؚٗ꧊《 
• VFOUETQMVHJOד ꧊׃׋ؚٗ׾ 
4♳ח⥂㶷 
• .3♳ך)JWFדؚٗ׾⸇䊨ծ꧊鎘 
• ꧊鎘⦼׾3%4ח⥂㶷׃ג〳鋔⻉
%BUBOBMZTJT'MPX	CZUBHPNPSJT
 
 
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F 
⳿ⰩIUUQXXXTMJEFTIBSFOFUUBHPNPSJTIBOEMJOHOPUTPCJHEBUB
1PPSNBOˏT%BUBOBMZTJT'MPX 
 
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F
$PMMFDU 
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F
$PMMFDU 
Ø ؙ٦هٝꂁ⥋؟٦غַ׵'MVFOUE
VFOUET 
QMVHJO׾ⵃ欽׃גؚٗ׾굲לׅ 
Ø 굲לؚׅٗכِ٦ؠך،ؙءّٝ׾2VFSZ4USJOH 
חろ׭ג굲לׅ 
• 醱꧟ז+40/כ굲לׁ׆ծ2VFSZ4USJOHח䞔㜠鯹ׇ׷ 
• )JWFדך꧊鎘儗חⰋג+40/ח㢌䳔 
• IUUQTFYBNQMFDPNCFBDPO TVCPCKDPVQPOBDUJPODMJDLDJE 
Ø 'MVFOUE꧊秈؟٦غכⵃ欽׃זְ 
• ٔ،ٕة؎ي꧊鎘ך䗳銲䚍כ植朐넝ֻזְ 
• ⱔꞿ圓䧭׮罋ִילז׵׆醱꧟חז׷ 
• 4ך㸜㹀䠬חֶ⟣ׇ׃׋ְ
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F 
4UPSF
4UPSF 
Ø ה׶ִ֮׆4ח굲לׅ 
Ø 4ךغ؛حزכ劤殢嗚鏾דⴓֽגֶֻ 
• غ؛حز⽃⡘ד،ؙإأ؝ٝزٗ٦ٕ〳腉 
• FYBNQMFDPNQSPEVDUJPOMPH 
Ø ؟٦غ䕵ⶴⴽחؗ٦׾ⴓֽגֶֻ 
• ⴽ؟٦غָ㟓ִג׮㸜䗰 
• FYBNQMFDPNQSPEVDUJPOMPHBQJ 
Ø 傈ⴽחؗ٦׾ⴓֽגֶֻ 
• )JWFךػ٦ذ؍ءّٝ׾ⵃ欽ׅ׷捀 
• FYBNQMFDPNQSPEVDUJPOMPHBQJEU
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F 
1SPDFTT
1SPDFTT 
Ø ⥋걾ה㹋籐ך㢸꟦غحث 
• 盖椚؟٦غַ׵)BEPPQ
)JWFך.3׾饯⹛ 
• 'MVFOUE꧊秈؟٦غ׾ⵃ欽׃גְזְ捀稢ⴖ׸הז׏׋ؚٗؿ؋؎ 
ٕ׾㖇簭ծ穠さ	)BEPPQכ稢ⴖ׸㼭ְׁؿ؋؎ٕךⳢ椚蕱䩛
 
• ؚٗח鎸ꐮׁ׸גְ׷2VFSZ4USJOH׾6%'׾ⵃ欽׃ג+40/ח㢌䳔 
• 鋅׷ץֹ鯥ד꧊鎘׃ג⥂㶷 
• ♳鎸Ⰻגך1SPDFTT׾)%'4חر٦ة׾衅הׁ׆4׾ⵃ欽׃ג㹋遤 
• 剑穄涸ז꧊鎘⦼׾3%4ח呓秛 
Ø 厫鮾ד鸞ְ儎꟦ؙؒٔ 
• 盖椚؟٦غַ׵)BEPPQ
)JWF
1SFTUPך.3׾饯⹛ 
• 1SFTUPָ)JWFךًةأز،	ذ٦ـٕ㹀纏
׾⿫撑 
• ر٦ةכⰋג4♳ח֮׷
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F 
7JTVBMJ[F
7JTVBMJ[F 
Ø .3ד꧊鎘׃׋ر٦ة׾.Z42-חٗ٦س 
Ø 盖椚؟٦غ♳ד⹛ֻ؟٦ؽأ׾ⵃ欽׃ג⦼׾〳鋔⻉ 
• ًٝغ٦Ⰻ㆞ָずׄ⦼׾鋅ג侧⦼然钠 
Ø ⡭׏ג׷爡ⰻ؟٦غח鑐꿀涸חMBTUJDTFBSDI
,JCBOB׾ 
㼪Ⰵ 
• ر٦ة׾䒚׶זָ׵ⴓ匿鯥׾罋ִ׋ְ儗ח⤑ⵃ
1PPSNBOˏT%BUBOBMZTJT'MPX 
 
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F
:(/*
ד׮䗳銲חז׏׋׵鷄⸇דֹ׷
鑥תזְ״ֲח׃גֶֻ
1PPSNBOˏT%BUBOBMZTJT'MPX 
 
1SPDFTT 
$PMMFDU 1BSTF 
$MFBOVQ 
4UPSF 1SPDFTT 
7JTVBMJ[F
3FGFSFODFT 
Ø 84NB[PO.3#FTU1SBDUJDFT 
• ؝ٖ׾铣׭ל荈ⴓ麦ך؝ٝذؙأزחさ׏׋.3圓䧭ָ׻ַ׷կ 
)BEPPQךⰅꟌה׃ג׮葺ְךדכկ 
Ø NJYJך鍑匿㛇湍הQBDIF)JWFדך+40/ػ٦؟ 
ך崞欽ך稱➜ 
• +40/ד顕׭ג7JFXדذ٦ـٕ׏שֻ䪔ֲ،؎ر؍،׾顗׏׋կؚٗ 
꧊鎘חꟼ׻׷➂麦ך؝ىُص؛٦ءّٝ؝أزծהְֲ嚊䙀׮顗׏׋կ 
Ø #BUDI1SPDFTTJOHBOE4USFBN1SPDFTTJOHCZ42- 
• ֿךز٦ؙ׾耀ְגⴓ匿㛇湍ח.11禸ؒٝآٝ׾ⵃ欽ׅ׷✲׾寸䠐կ 
*NQBMBה1SFTUP׾嫰鯰׃ծ4ח׮湫䱸ؙؒٔ׾䫎־׸׷1SFTUP׾㼪 
Ⰵ׃׋կ	*NQBMB׮如劍غ٦آّٝדכ4ח湫䱸ؙؒٔ䫎־׸׷׵׃ 
ְךד׉ך儗חⱄ䏝嗚鏾✮㹀
չⰅꟌ⟃♳պ׾湡䭷׃׋ 
1ZUIPO
.3崞欽倯岀ךⰟ剣 
	ؚٗ꧊鎘
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
BXTDMJ 
Ø ٔٔ٦أך7FSַ׵.3堣腉ך1SFWJFX 
أذ٦ةأָ《׸ծ兦׸ג㸜㹀׃׋1*ה׃גⵃ欽〳腉 
Ø ➙תדرؿ؋ؙز׌׏׋3VCZךMBTUJD.BQ3FEVDFأؙ 
ٔفزַ׵⛦׶䳔ִ 
• QJQד知⽃ח؎ٝأز٦ٕדֹ׷ 
• ⟃⵸ַ׵BXTDMJ׾⢪׏ג׷ךדخ٦ٕ窟♧ 
• (JU)VC♳דךꟚ涪ָ崞涪ד13׮⳿ׇ׷
8F-PWF1ZUIPO
$ 
mkvirtualenv 
pycon-­‐emr-­‐dev 
(pycon-­‐emr-­‐dev)$ 
pip 
install 
awscli 
(pycon-­‐emr-­‐dev)$ 
mkdir 
~/.awscli 
(pycon-­‐emr-­‐dev)$ 
cat 
-­‐EOF 
 
~/.awscli/config 
[profile 
development] 
aws_access_key_id=development_access_key 
aws_secret_access_key=development_secret_key 
region=ap-­‐northeast-­‐1 
EOF 
(pycon-­‐emr-­‐dev)$ 
cat 
-­‐EOF 
 
$VIRTUAL_ENV/bin/activate 
export 
AWS_CONFIG_FILE=~/.awscli/config 
export 
AWS_DEFAULT_PROFILE=development 
source 
aws_zsh_completer.sh 
EOF
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
$ 
aws 
emr 
create-­‐cluster 
-­‐-­‐ami-­‐version 
3.1.1 
 
-­‐-­‐name 
'PyConJP 
2014 
(AMI 
3.1.1 
Hive)' 
 
-­‐-­‐tags 
Name=pycon-­‐jp-­‐emr 
environment=development 
 
-­‐-­‐ec2-­‐attributes 
KeyName=yourkey 
-­‐-­‐log-­‐uri 
's3://yourbucket/jobflow_logs/' 
 
-­‐-­‐no-­‐auto-­‐terminate 
 
-­‐-­‐visible-­‐to-­‐all-­‐users 
 
-­‐-­‐instance-­‐groups 
file://./normal-­‐instance-­‐setup.json 
 
-­‐-­‐applications 
file://./app-­‐hive.json
[ 
{ 
OPSNBMJOTUBODFHSPVQKTPO BQQIJWFKTPO 
Name: 
emr-­‐master, 
InstanceGroupType: 
MASTER, 
InstanceCount: 
1, 
InstanceType: 
m1.medium 
}, 
{ 
Name: 
emr-­‐core, 
InstanceGroupType: 
CORE, 
InstanceCount: 
2, 
InstanceType: 
m1.medium 
} 
] 
[ 
{ 
Name: 
HIVE 
} 
]
SFTVMU 
{ 
ClusterId: 
j-­‐8xxxxxxxxx 
}
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
$ 
aws 
emr 
add-­‐steps 
-­‐-­‐cluster-­‐id 
j-­‐8xxxxxxxxx 
 
-­‐-­‐steps 
file://./hive-­‐sample-­‐step-­‐1.json
[ 
{ 
IJWFTBNQMFTUFQKTPO 
Args: 
[ 
-­‐f, 
s3n://yourbucket/hive-­‐script/sample01.hql, 
-­‐d, 
BUCKET_NAME=yourbucket, 
-­‐d, 
TARGET_DATE=20140818 
], 
ActionOnFailure: 
CONTINUE, 
Name: 
Hive 
Sample 
Program 
01, 
Type: 
HIVE 
}, 
{ 
Args: 
[ 
-­‐f, 
s3n://yourbucket/hive-­‐script/sample02.hql, 
-­‐d, 
BUCKET_NAME=yourbucket, 
-­‐d, 
TARGET_DATE=20140818 
], 
ActionOnFailure: 
CONTINUE, 
Name: 
Hive 
Sample 
Program 
02, 
Type: 
HIVE 
} 
]
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
$ 
aws 
emr 
add-­‐steps 
-­‐-­‐cluster-­‐id 
j-­‐8xxxxxxxxx 
 
-­‐-­‐steps 
file://./s3distcp-­‐sample-­‐step.json
[ 
{ 
TEJTUDQTBNQMFTUFQKTPO 
Name: 
s3distcp 
Sample, 
ActionOnFailure: 
CONTINUE, 
Jar: 
/home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar, 
Type: 
CUSTOM_JAR, 
Args: 
[ 
-­‐-­‐src, 
s3n://yourbucket/access_log/dt=20140818, 
-­‐-­‐dest, 
s3n://yourbucket/compressed_log/dt=20140818, 
-­‐-­‐groupBy, 
.*(nginx_access_log-­‐).*, 
-­‐-­‐targetSize, 
100, 
-­‐-­‐outputCodec, 
gzip 
] 
} 
]
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
$ 
aws 
emr 
create-­‐cluster 
-­‐-­‐ami-­‐version 
3.1.1 
 
-­‐-­‐name 
'PyConJP 
2014 
(AMI 
3.1.1 
Hive)' 
 
-­‐-­‐tags 
Name=pycon-­‐jp-­‐emr 
environment=development 
 
-­‐-­‐ec2-­‐attributes 
KeyName=yourkey 
-­‐-­‐log-­‐uri 
's3://yourbucket/jobflow_logs/' 
 
-­‐-­‐no-­‐auto-­‐terminate 
 
-­‐-­‐visible-­‐to-­‐all-­‐users 
 
-­‐-­‐instance-­‐groups 
file://./normal-­‐instance-­‐setup.json 
 
-­‐-­‐applications 
file://./app-­‐hive-­‐with-­‐config.json
[ 
{ 
BQQIJWFXJUIDPOHKTPO 
Args: 
[ 
-­‐-­‐hive-­‐site=s3://yourbucket/libs/config/hive-­‐site.xml 
], 
Name: 
HIVE 
} 
]
IJWFTJUFYNM 
?xml 
version=1.0? 
?xml-­‐stylesheet 
type=text/xsl 
href=configuration.xsl? 
configuration 
property 
namehive.optimize.s3.query/name 
valuetrue/value 
descriptionOptimize 
query 
on 
S3/description 
/property 
/configuration
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
$ 
aws 
emr 
create-­‐cluster 
-­‐-­‐ami-­‐version 
3.1.1 
 
-­‐-­‐name 
'PyConJP 
2014 
(AMI 
3.1.1 
Hive 
+ 
Presto)' 
 
-­‐-­‐tags 
Name=pycon-­‐jp-­‐emr 
environment=development 
 
-­‐-­‐ec2-­‐attributes 
KeyName=yourkey 
-­‐-­‐log-­‐uri 
's3://yourbucket/jobflow_logs/' 
 
-­‐-­‐no-­‐auto-­‐terminate 
 
-­‐-­‐visible-­‐to-­‐all-­‐users 
 
-­‐-­‐instance-­‐groups 
file://./normal-­‐instance-­‐setup.json 
 
-­‐-­‐bootstrap-­‐actions 
file://./bootstrap-­‐presto.json 
 
-­‐-­‐applications 
file://./app-­‐hive-­‐with-­‐config.json
[ 
{ 
Name: 
Install/Setup 
Presto, 
Path: 
s3://yourbucket/libs/setup-­‐presto.rb, 
Args: 
[ 
-­‐-­‐task_memory, 
1GB, 
-­‐-­‐log-­‐level, 
DEGUB, 
-­‐-­‐version, 
0.75, 
-­‐-­‐presto-­‐repo-­‐url, 
http://central.maven.org/maven2/com/ 
facebook/presto/, 
-­‐-­‐sink-­‐buffer-­‐size, 
1GB, 
-­‐-­‐query-­‐max-­‐age, 
1h, 
-­‐-­‐jvm-­‐config, 
-­‐server 
-­‐Xmx2G 
-­‐XX:+UseConcMarkSweepGC 
-­‐XX: 
+ExplicitGCInvokesConcurrent 
-­‐XX:+CMSClassUnloadingEnabled 
-­‐XX: 
+AggressiveOpts 
-­‐XX:+HeapDumpOnOutOfMemoryError 
-­‐ 
XX:OnOutOfMemoryError=kill 
-­‐9 
%p 
-­‐XX:PermSize=150M 
-­‐ 
XX:MaxPermSize=150M 
-­‐XX:ReservedCodeCacheSize=150M 
-­‐ 
Dhive.config.resources=/home/hadoop/conf/core-­‐site.xml,/home/ 
hadoop/conf/hdfs-­‐site.xml 
] 
} 
]
Ø TFUVQQSFTUPSC㹋䡾כ	IUUQTHJUIVCDPN 
BXTMBCTFNSCPPUTUSBQBDUJPOTCMPCNBTUFS 
QSFTUPJOTUBMM
 
Ø 84ָ㹋꿀涸ח⳿׃ג׷1SFTUP׾.3חⰅ׸׷捀 
ך#PPUTUSBQأؙٔفز 
Ø .*PSדכ⹛ְ׋ֽוծ.*דכ 
⹛ַזַ׏׋	)JWF)JWF
 
Ø 5ISJGU4FSWJDFךه٦زָ殯ז׷׏שְ
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
Ø .FUBTUPSFהכ)JWFךذ٦ـٕ㹀纏瘝ך䞔㜠׾⥂ 
 
㶷׃גֶֻ㜥䨽ךֿה 
Ø 植㖈㢳ֻכ.Z42-ָⵃ欽ׁ׸גְ׷ 
Ø ⡦׮鏣㹀׃זְה.3ך؎ٝأةٝأך.Z42-ח 
⥂㶷ׁ׸׷ 
Ø .FUBTUPSF׾.3㢩鿇ך%#ח鏣㹀׃גֶֻֿהדծ 
.3甧׍♳־׷ꥷח%%-׾ⱄ䏝崧ׁזֻג׮葺ֻ 
ז׷ 
Ø %#⩎ך4FDVSJUZ(SPVQ׾⥜姻ׅ׷䗳銲֮׶
configuration 
property 
BQQIJWFXJUIDPOHKTPO 
namehive.optimize.s3.query/name 
valuetrue/value 
descriptionOptimize 
query 
on 
S3/description 
/property 
property 
namejavax.jdo.option.ConnectionURL/name 
valuejdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true/value 
descriptionJDBC 
connect 
string 
for 
a 
JDBC 
metastore/description 
/property 
property 
namejavax.jdo.option.ConnectionDriverName/name 
valuecom.mysql.jdbc.Driver/value 
descriptionDriver 
class 
name 
for 
a 
JDBC 
metastore/description 
/property 
property 
namejavax.jdo.option.ConnectionUserName/name 
valueusername/value 
descriptionUsername 
to 
use 
against 
metastore 
database/description 
/property 
property 
namejavax.jdo.option.ConnectionPassword/name 
valuepassword/value 
descriptionPassword 
to 
use 
against 
metastore 
database/description 
/property 
/configuration
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
Ø 1ZUIPOغحثⳢ椚ⰻד.3׾饯⹛׃׋ְ✲׮֮׷ 
Ø ׮׃ֻכ$FMFSZך5BTLה׃ג饯⹛׃׋ְהַ 
Ø ׉ְֲ׏׋㜥さחכ1ZUIPOך⚥ַ׵.3׾⢪ֲ✲ 
 
׮〳腉 
Ø CPUPFNS׾ⵃ欽ׅ׷ 
Ø BXTDMJⰻַ׵⤑ⵃז6UJMJUZ׾《׏גֹג⢪ֲך׮ 
֮׶ַ׮
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
# 
-­‐*-­‐ 
coding: 
utf-­‐8 
-­‐*-­‐ 
from 
datetime 
import 
datetime 
from 
boto.emr 
import 
connect_to_region 
from 
boto.emr.step 
import 
InstallHiveStep 
def 
setup_emr(): 
# 
need 
to 
export 
AWS_ACCESS_KEY_ID 
and 
AWS_SECRET_ACCESS_KEY 
# 
as 
environment 
variables. 
conn 
= 
connect_to_region('ap-­‐northeast-­‐1') 
install_step 
= 
InstallHiveStep(hive_versions='0.11.0.2') 
jobid 
= 
conn.run_jobflow( 
name='Create 
EMR 
[{}]'.format(datetime.today().strftime('%Y%m%d')), 
log_uri='s3://yourbucket/jobflow_logs/', 
ec2_keyname='your_key', 
master_instance_type='m1.medium', 
slave_instance_type='m1.medium', 
num_instances=3, 
action_on_failure='TERMINATE_JOB_FLOW', 
keep_alive=True, 
enable_debugging=False, 
hadoop_version='2.4.0', 
steps=[install_step], 
bootstrap_actions=[], 
instance_groups=None, 
additional_info=None, 
ami_version='3.1.1', 
api_params=None, 
visible_to_all_users=True, 
job_flow_role=None) 
return 
jobid 
if 
__name__ 
== 
'__main__': 
jobflow_id 
= 
setup_emr() 
print 
JobFlowID: 
{} 
started..format(jobflow_id)
Ø 84ךؙٖرٝءٍٕכا٦أⰻחⰅ׸זְ✲ 
• 橆㞮㢌侧חⰅ׸׷׮װ׭׋倯ָ葺ְ 
• ٗ٦ٕؕوءٝדذأز׃׋ְ㜥さכ䊺׬搀׃ַ 
• .3׾甧׍♳־׷$ח➰♷ׅ׷*.3PMFדⵖ䖴
GSPN UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
YFDVUF 
)JWF2- 
VTF 
.3
jobid 
ꞿֻז׏ג׃ת׏׋ךדꨜ㔲孡׌ֽ 
= 
conn.run_jobflow( 
name='Create 
EMR 
and 
Exec 
hiveql 
[{}]'.format(target_date), 
log_uri='s3://{}/jobflow_logs/'.format(bucket_name), 
ec2_keyname='your_key', 
master_instance_type='m1.medium', 
slave_instance_type='m1.medium', 
num_instances=3, 
action_on_failure='TERMINATE_JOB_FLOW', 
keep_alive=True, 
enable_debugging=False, 
hadoop_version='2.4.0', 
steps=[install_step], 
bootstrap_actions=[], 
instance_groups=None, 
additional_info=None, 
ami_version='3.1.1', 
api_params=None, 
visible_to_all_users=True, 
job_flow_role=None) 
query_files 
= 
['sample01.hql', 
'sample02.hql'] 
hql_steps 
= 
[] 
for 
query_file 
in 
query_files: 
hql_step 
= 
HiveStep( 
name='Executing 
Query 
[{}]'.format(query_file), 
hive_file='s3n://{0}/hive-­‐script/{1}'.format( 
bucket_name, 
query_file), 
hive_versions=hive_version, 
hive_args=['-­‐dTARGET_DATE={0}'.format(target_date), 
'-­‐dBUCKET_NAME={0}'.format(bucket_name)]) 
hql_steps.append(hql_step) 
conn.add_jobflow_steps(jobid, 
hql_steps)
VTF UPEPUIFGPMMPXJOH 
BXTDMJ YFDVUF 
)JWF2- 
YFDVUF 
TEJTUDQ 
$POH 
:PVS.3 
#PPUTUSQ 
1SFTUP 
$SFBUF 
$MVTUFS 
.FUBTUS 
$POH 
1ZUIPO 
4DSJQU 
$SFBUF 
$MVTUFS 
+PC'MPX 
.HNOU 
GSPN 
YFDVUF 
)JWF2- 
.3
Ø غحثⳢ椚ח⣛㶷ꟼ⤘׾⡲׶׋ְ 
• ָ穄׻׏׋׵#ה$ず儗ח㹋遤ׅ׷ծ瘝 
• ה#ָ穄׻׏׋׵$׾㹋遤ׅ׷ծ瘝 
Ø 饯⹛儗꟦ך盖椚׾׮׏ה䩛鯪ח遤ְ׋ְ
• IUUQTHJUIVCDPNTQPUJGZMVJHJ 
• 1ZUIPO醡ךػ؎فٓ؎ٝ盖椚ؿٖ٦يٙ٦ؙ 
• )BEPPQ4USFBNJOH׾ⵃ欽׃׋.BQ3FEVDFָ知⽃ח剅ֽ׷堣圓֮׶ 
• 1ZUIPOך؝٦س׌ֽד⣛㶷䚍鍑寸 
• ⣛㶷䚍〳鋔⻉	ⴽ؟٦ؽأה׃ג甧׍♳־
 
• ⣛㶷䚍〳鋔⻉خ٦ٕכ钠鏾瘝稢ְַ堣腉כ搀ְ 
• )JWF2-ך㹋遤ח㼎䘔׃גְ׷ 
• 1JHך㹋遤ח㼎䘔׃גְ׷ 
• 4ך乼⡲ח㼎䘔׃ג׷ 
• 植朐׌הؔ٦غ٦ٕؗ
• 盖椚歗꬗כ%KBOHP׾ⵃ欽 
• ず♧ך؟٦غדDFMFSZהDFMFSZCFBU׾饯⹛ 
• EKBOHPDFMFSZ׾ⵃ欽׃ג暴㹀ةأؙ׾暴㹀ך儗꟦חُؗ٦חⰅ׸׷״ 
 
ֲח鏣㹀 
• DFMFSZCFBUָُؗ٦חⰅ׏׋ةأؙ׾䭪׏ג㹋遤׃גֻ׸׷ 
• EKBOHPDFMFSZזֻג׮DFMFSZה%KBOHPכ鸬䵿דֹ׷ֽוծֿךأ؛ 
آُ٦ٕ堣腉ָ⤑ⵃזךדת׌⢪׏ג׷
3FGFSFODFT 
Ø IUUQTHJUIVCDPNBXTBXTDMJ 
• 劤㹺ך项俱הا٦أ 
Ø IUUQTHJUIVCDPNCPUPCPUP 
• 劤㹺ך项俱הا٦أ
,BONV 
窫额⟗꟦⹫꧊⚥
ת׆כֶ鑧׌ֽד׮
IUUQTXXXXBOUFEMZDPNQSPKFDUT

More Related Content

Python + Hive on AWS EMR で貧者のログサマリ

  • 3. BDIJLV /BNFLJSB$IJLV 5XJUUFS!@BDIJLV (JU)VC!BDIJLV 馯㄂拦莸 LJSB$IJLV'JSFד嗚稊 耵噟ؒٝآص،!LBONV
  • 4. (PBM 涺ׁ׿ך鑧׾耀ֹ׋ְ 荈ⴓָ Ø չז׈׉ך圓䧭זךַպח搊挿׾縧ְ׋✲⢽ךⰟ剣 Ø չⰅꟌ⟃♳պ׾湡䭷׃׋1ZUIPO
  • 6. ,BONV #VTJOFTT Ø ؕ٦س⠓爡ה⼿噟׃׋寸幥ر٦ةⴓ匿 Ø ؕ٦سח秡בֻؙ٦هٝךꂁ⥋ Ø $BSE-JOLFE0FS $-0
  • 8. $BSE-JOLFE0FS չ䏄ךؙ٦هٝؒٝزٔ٦׃ ׋ؕ٦سד顠ְ暟ׅ׸לه ؎ٝز؜حزկպ ؕ٦س ⠓爡 ,BONV չְֲֲֶֿ㹏ׁ׿ח ְֲֲֿؙ٦هٝ⳿׃׋ְպ չְֲֲֿ飑顠⫘ぢךֶ㹏ׁ׿ך倯 ְְָךדכպ չֿ׿ז穠卓׌׏׋ךדծ如㔐כֿ ְֲֲإًؚٝزⴖ׶ת׃׳ֲպ ؕ٦س ⠓㆞ ֶ䏄
  • 9. 2VJDL4VSWFZ Ø ؚٗⴓ匿חꟼ׻׏גְ׵׏׃ׯ׷倯 Ø )BEPPQ⢪׏ג׵׏׃ׯ׷倯 Ø )JWF⢪׏ג׵׏׃ׯ׷倯 Ø .3⢪׏ג׵׏׃ׯ׷倯
  • 14. 1PPSNBOˏT Ø ➙֮׷植朐׾⯋ח Ø 満ٔا٦أ ➂儗꟦穗꿀 ד湡涸׾麦䧭ׅ׷㪦⹲ Ø 湡涸׾麦䧭ׅ׷أؾ٦س׾〳腉זꣲ׶♳־׷㪦⹲ Ø 搀欽ז佄⳿׾鼘ֽ׷㪦⹲
  • 15. ,BONV OHJOFFS5FBN NBLJ $0OHJOFFS @JEFZVUB %FTJHOFS NPRBEB OHJOFFS @BDIJLV OHJOFFS 爡ꞿ噟灇瑔Ꟛ涪 رؠ؎ٝؿٗٝز أوم،فؚٔٗⴓ匿 ؿٗٝزغحؙؒٝس ؎ٝؿٓأوم،فٔ غحؙؒٝس؎ٝؿٓ ⴓ匿㛇湍ؚٗⴓ匿㼎ػ٦زش٦璞〡
  • 17. 剢 ꬊ㖇簭 • ؟٦ؽأך䧭ꞿהⰟח㟓ִ׷鋅鴥׫ • 剢⽃⡘ד،سمحؙזؙؒٔ׮䫎־׋ְ Ø 爡ⰻח㣐鋉垷ر٦ة׾Ⳣ椚ׅ׷濼鋅׾顕׭׋ְ • չل٦أꂁⴓ׾׃׋♳דպ濼鋅׾顕׭׷ • 㢩鿇ח⳿׃חְֻإٝءذ؍ـזر٦ة׮㶷㖈 Ø 麊欽؝أزⴱ劍䫎项׾⡚ֻ䫇ִ׋ְ
  • 18. /PU3FRVJSFNFOUT Ø ⴓ匿ָٔ،ٕة؎يד֮׷䗳銲䚍כ植朐넝ֻזְ Ø ،سمحؙⴓ匿㛇湍ך؟٦ؽأٖكٕכ寸׃ג넝ֻזְ • 兛鸐ךغحثⳢ椚כ衅׍גכ꼽湡׌ֽו • 䌢חⵃ欽〳腉ז朐䡾חז׏גְזֻג׮葺ְ • ⵃ欽כ爡ⰻחꣲ㹀ׁ׸גְ׷ Ø ׋׌׃ծ♳鎸ָ3FRVJSFNFOUTחז׷〳腉䚍כ⼧ⴓ剣׷
  • 20. 84.3 Ø 侧֮׷84؟٦ؽأךֲ׍ך♧א Ø )BEPPQװ)BEPPQؒ؝ءأذيⰻך48ָر ؿٕؓزדⵃ欽〳腉 Ø 1*ד饯⹛ծ+PCך㹋遤ծ⨡姺׾乼⡲〳腉 Ø ٌصةؚٔٝ瘝׮״׃זח㹋倵׃גֻ׸׷ Ø 4׾)%'4ך剏׶חⵃ欽〳腉 Ø ؙٓأةך〴侧㢌刿ָ㺁僒
  • 21. SDIJUFDUVSF 盖椚؟٦غ ؙ٦هٝ ꂁ⥋؟٦غ ؙ٦هٝ ꂁ⥋؟٦غ • ꂁ⥋؟٦غ♳ך'MVFOUEדؚٗ꧊《 • VFOUETQMVHJOד ꧊׃׋ؚٗ׾ 4♳ח⥂㶷 • .3♳ך)JWFדؚٗ׾⸇䊨ծ꧊鎘 • ꧊鎘⦼׾3%4ח⥂㶷׃ג〳鋔⻉
  • 22. %BUBOBMZTJT'MPX CZUBHPNPSJT 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F ⳿ⰩIUUQXXXTMJEFTIBSFOFUUBHPNPSJTIBOEMJOHOPUTPCJHEBUB
  • 23. 1PPSNBOˏT%BUBOBMZTJT'MPX 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F
  • 24. $PMMFDU 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F
  • 26. VFOUET QMVHJO׾ⵃ欽׃גؚٗ׾굲לׅ Ø 굲לؚׅٗכِ٦ؠך،ؙءّٝ׾2VFSZ4USJOH חろ׭ג굲לׅ • 醱꧟ז+40/כ굲לׁ׆ծ2VFSZ4USJOHח䞔㜠鯹ׇ׷ • )JWFדך꧊鎘儗חⰋג+40/ח㢌䳔 • IUUQTFYBNQMFDPNCFBDPO TVCPCKDPVQPOBDUJPODMJDLDJE Ø 'MVFOUE꧊秈؟٦غכⵃ欽׃זְ • ٔ،ٕة؎ي꧊鎘ך䗳銲䚍כ植朐넝ֻזְ • ⱔꞿ圓䧭׮罋ִילז׵׆醱꧟חז׷ • 4ך㸜㹀䠬חֶ⟣ׇ׃׋ְ
  • 27. 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F 4UPSF
  • 28. 4UPSF Ø ה׶ִ֮׆4ח굲לׅ Ø 4ךغ؛حزכ劤殢嗚鏾דⴓֽגֶֻ • غ؛حز⽃⡘ד،ؙإأ؝ٝزٗ٦ٕ〳腉 • FYBNQMFDPNQSPEVDUJPOMPH Ø ؟٦غ䕵ⶴⴽחؗ٦׾ⴓֽגֶֻ • ⴽ؟٦غָ㟓ִג׮㸜䗰 • FYBNQMFDPNQSPEVDUJPOMPHBQJ Ø 傈ⴽחؗ٦׾ⴓֽגֶֻ • )JWFךػ٦ذ؍ءّٝ׾ⵃ欽ׅ׷捀 • FYBNQMFDPNQSPEVDUJPOMPHBQJEU
  • 29. 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F 1SPDFTT
  • 30. 1SPDFTT Ø ⥋걾ה㹋籐ך㢸꟦غحث • 盖椚؟٦غַ׵)BEPPQ
  • 31. )JWFך.3׾饯⹛ • 'MVFOUE꧊秈؟٦غ׾ⵃ欽׃גְזְ捀稢ⴖ׸הז׏׋ؚٗؿ؋؎ ٕ׾㖇簭ծ穠さ )BEPPQכ稢ⴖ׸㼭ְׁؿ؋؎ٕךⳢ椚蕱䩛 • ؚٗח鎸ꐮׁ׸גְ׷2VFSZ4USJOH׾6%'׾ⵃ欽׃ג+40/ח㢌䳔 • 鋅׷ץֹ鯥ד꧊鎘׃ג⥂㶷 • ♳鎸Ⰻגך1SPDFTT׾)%'4חر٦ة׾衅הׁ׆4׾ⵃ欽׃ג㹋遤 • 剑穄涸ז꧊鎘⦼׾3%4ח呓秛 Ø 厫鮾ד鸞ְ儎꟦ؙؒٔ • 盖椚؟٦غַ׵)BEPPQ
  • 32. )JWF
  • 34. 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F 7JTVBMJ[F
  • 35. 7JTVBMJ[F Ø .3ד꧊鎘׃׋ر٦ة׾.Z42-חٗ٦س Ø 盖椚؟٦غ♳ד⹛ֻ؟٦ؽأ׾ⵃ欽׃ג⦼׾〳鋔⻉ • ًٝغ٦Ⰻ㆞ָずׄ⦼׾鋅ג侧⦼然钠 Ø ⡭׏ג׷爡ⰻ؟٦غח鑐꿀涸חMBTUJDTFBSDI
  • 36. ,JCBOB׾ 㼪Ⰵ • ر٦ة׾䒚׶זָ׵ⴓ匿鯥׾罋ִ׋ְ儗ח⤑ⵃ
  • 37. 1PPSNBOˏT%BUBOBMZTJT'MPX 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F
  • 38. :(/*
  • 41. 1PPSNBOˏT%BUBOBMZTJT'MPX 1SPDFTT $PMMFDU 1BSTF $MFBOVQ 4UPSF 1SPDFTT 7JTVBMJ[F
  • 42. 3FGFSFODFT Ø 84NB[PO.3#FTU1SBDUJDFT • ؝ٖ׾铣׭ל荈ⴓ麦ך؝ٝذؙأزחさ׏׋.3圓䧭ָ׻ַ׷կ )BEPPQךⰅꟌה׃ג׮葺ְךדכկ Ø NJYJך鍑匿㛇湍הQBDIF)JWFדך+40/ػ٦؟ ך崞欽ך稱➜ • +40/ד顕׭ג7JFXדذ٦ـٕ׏שֻ䪔ֲ،؎ر؍،׾顗׏׋կؚٗ ꧊鎘חꟼ׻׷➂麦ך؝ىُص؛٦ءّٝ؝أزծהְֲ嚊䙀׮顗׏׋կ Ø #BUDI1SPDFTTJOHBOE4USFBN1SPDFTTJOHCZ42- • ֿךز٦ؙ׾耀ְגⴓ匿㛇湍ח.11禸ؒٝآٝ׾ⵃ欽ׅ׷✲׾寸䠐կ *NQBMBה1SFTUP׾嫰鯰׃ծ4ח׮湫䱸ؙؒٔ׾䫎־׸׷1SFTUP׾㼪 Ⰵ׃׋կ *NQBMB׮如劍غ٦آّٝדכ4ח湫䱸ؙؒٔ䫎־׸׷׵׃ ְךד׉ך儗חⱄ䏝嗚鏾✮㹀
  • 45. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 46. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 47. BXTDMJ Ø ٔٔ٦أך7FSַ׵.3堣腉ך1SFWJFX أذ٦ةأָ《׸ծ兦׸ג㸜㹀׃׋1*ה׃גⵃ欽〳腉 Ø ➙תדرؿ؋ؙز׌׏׋3VCZךMBTUJD.BQ3FEVDFأؙ ٔفزַ׵⛦׶䳔ִ • QJQד知⽃ח؎ٝأز٦ٕדֹ׷ • ⟃⵸ַ׵BXTDMJ׾⢪׏ג׷ךדخ٦ٕ窟♧ • (JU)VC♳דךꟚ涪ָ崞涪ד13׮⳿ׇ׷
  • 49. $ mkvirtualenv pycon-­‐emr-­‐dev (pycon-­‐emr-­‐dev)$ pip install awscli (pycon-­‐emr-­‐dev)$ mkdir ~/.awscli (pycon-­‐emr-­‐dev)$ cat -­‐EOF ~/.awscli/config [profile development] aws_access_key_id=development_access_key aws_secret_access_key=development_secret_key region=ap-­‐northeast-­‐1 EOF (pycon-­‐emr-­‐dev)$ cat -­‐EOF $VIRTUAL_ENV/bin/activate export AWS_CONFIG_FILE=~/.awscli/config export AWS_DEFAULT_PROFILE=development source aws_zsh_completer.sh EOF
  • 50. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 51. $ aws emr create-­‐cluster -­‐-­‐ami-­‐version 3.1.1 -­‐-­‐name 'PyConJP 2014 (AMI 3.1.1 Hive)' -­‐-­‐tags Name=pycon-­‐jp-­‐emr environment=development -­‐-­‐ec2-­‐attributes KeyName=yourkey -­‐-­‐log-­‐uri 's3://yourbucket/jobflow_logs/' -­‐-­‐no-­‐auto-­‐terminate -­‐-­‐visible-­‐to-­‐all-­‐users -­‐-­‐instance-­‐groups file://./normal-­‐instance-­‐setup.json -­‐-­‐applications file://./app-­‐hive.json
  • 52. [ { OPSNBMJOTUBODFHSPVQKTPO BQQIJWFKTPO Name: emr-­‐master, InstanceGroupType: MASTER, InstanceCount: 1, InstanceType: m1.medium }, { Name: emr-­‐core, InstanceGroupType: CORE, InstanceCount: 2, InstanceType: m1.medium } ] [ { Name: HIVE } ]
  • 53. SFTVMU { ClusterId: j-­‐8xxxxxxxxx }
  • 54. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 55. $ aws emr add-­‐steps -­‐-­‐cluster-­‐id j-­‐8xxxxxxxxx -­‐-­‐steps file://./hive-­‐sample-­‐step-­‐1.json
  • 56. [ { IJWFTBNQMFTUFQKTPO Args: [ -­‐f, s3n://yourbucket/hive-­‐script/sample01.hql, -­‐d, BUCKET_NAME=yourbucket, -­‐d, TARGET_DATE=20140818 ], ActionOnFailure: CONTINUE, Name: Hive Sample Program 01, Type: HIVE }, { Args: [ -­‐f, s3n://yourbucket/hive-­‐script/sample02.hql, -­‐d, BUCKET_NAME=yourbucket, -­‐d, TARGET_DATE=20140818 ], ActionOnFailure: CONTINUE, Name: Hive Sample Program 02, Type: HIVE } ]
  • 57. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 58. $ aws emr add-­‐steps -­‐-­‐cluster-­‐id j-­‐8xxxxxxxxx -­‐-­‐steps file://./s3distcp-­‐sample-­‐step.json
  • 59. [ { TEJTUDQTBNQMFTUFQKTPO Name: s3distcp Sample, ActionOnFailure: CONTINUE, Jar: /home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar, Type: CUSTOM_JAR, Args: [ -­‐-­‐src, s3n://yourbucket/access_log/dt=20140818, -­‐-­‐dest, s3n://yourbucket/compressed_log/dt=20140818, -­‐-­‐groupBy, .*(nginx_access_log-­‐).*, -­‐-­‐targetSize, 100, -­‐-­‐outputCodec, gzip ] } ]
  • 60. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 61. $ aws emr create-­‐cluster -­‐-­‐ami-­‐version 3.1.1 -­‐-­‐name 'PyConJP 2014 (AMI 3.1.1 Hive)' -­‐-­‐tags Name=pycon-­‐jp-­‐emr environment=development -­‐-­‐ec2-­‐attributes KeyName=yourkey -­‐-­‐log-­‐uri 's3://yourbucket/jobflow_logs/' -­‐-­‐no-­‐auto-­‐terminate -­‐-­‐visible-­‐to-­‐all-­‐users -­‐-­‐instance-­‐groups file://./normal-­‐instance-­‐setup.json -­‐-­‐applications file://./app-­‐hive-­‐with-­‐config.json
  • 62. [ { BQQIJWFXJUIDPOHKTPO Args: [ -­‐-­‐hive-­‐site=s3://yourbucket/libs/config/hive-­‐site.xml ], Name: HIVE } ]
  • 63. IJWFTJUFYNM ?xml version=1.0? ?xml-­‐stylesheet type=text/xsl href=configuration.xsl? configuration property namehive.optimize.s3.query/name valuetrue/value descriptionOptimize query on S3/description /property /configuration
  • 64. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 65. $ aws emr create-­‐cluster -­‐-­‐ami-­‐version 3.1.1 -­‐-­‐name 'PyConJP 2014 (AMI 3.1.1 Hive + Presto)' -­‐-­‐tags Name=pycon-­‐jp-­‐emr environment=development -­‐-­‐ec2-­‐attributes KeyName=yourkey -­‐-­‐log-­‐uri 's3://yourbucket/jobflow_logs/' -­‐-­‐no-­‐auto-­‐terminate -­‐-­‐visible-­‐to-­‐all-­‐users -­‐-­‐instance-­‐groups file://./normal-­‐instance-­‐setup.json -­‐-­‐bootstrap-­‐actions file://./bootstrap-­‐presto.json -­‐-­‐applications file://./app-­‐hive-­‐with-­‐config.json
  • 66. [ { Name: Install/Setup Presto, Path: s3://yourbucket/libs/setup-­‐presto.rb, Args: [ -­‐-­‐task_memory, 1GB, -­‐-­‐log-­‐level, DEGUB, -­‐-­‐version, 0.75, -­‐-­‐presto-­‐repo-­‐url, http://central.maven.org/maven2/com/ facebook/presto/, -­‐-­‐sink-­‐buffer-­‐size, 1GB, -­‐-­‐query-­‐max-­‐age, 1h, -­‐-­‐jvm-­‐config, -­‐server -­‐Xmx2G -­‐XX:+UseConcMarkSweepGC -­‐XX: +ExplicitGCInvokesConcurrent -­‐XX:+CMSClassUnloadingEnabled -­‐XX: +AggressiveOpts -­‐XX:+HeapDumpOnOutOfMemoryError -­‐ XX:OnOutOfMemoryError=kill -­‐9 %p -­‐XX:PermSize=150M -­‐ XX:MaxPermSize=150M -­‐XX:ReservedCodeCacheSize=150M -­‐ Dhive.config.resources=/home/hadoop/conf/core-­‐site.xml,/home/ hadoop/conf/hdfs-­‐site.xml ] } ]
  • 67. Ø TFUVQQSFTUPSC㹋䡾כ IUUQTHJUIVCDPN BXTMBCTFNSCPPUTUSBQBDUJPOTCMPCNBTUFS QSFTUPJOTUBMM Ø 84ָ㹋꿀涸ח⳿׃ג׷1SFTUP׾.3חⰅ׸׷捀 ך#PPUTUSBQأؙٔفز Ø .*PSדכ⹛ְ׋ֽוծ.*דכ ⹛ַזַ׏׋ )JWF)JWF Ø 5ISJGU4FSWJDFךه٦زָ殯ז׷׏שְ
  • 68. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 69. Ø .FUBTUPSFהכ)JWFךذ٦ـٕ㹀纏瘝ך䞔㜠׾⥂ 㶷׃גֶֻ㜥䨽ךֿה Ø 植㖈㢳ֻכ.Z42-ָⵃ欽ׁ׸גְ׷ Ø ⡦׮鏣㹀׃זְה.3ך؎ٝأةٝأך.Z42-ח ⥂㶷ׁ׸׷ Ø .FUBTUPSF׾.3㢩鿇ך%#ח鏣㹀׃גֶֻֿהדծ .3甧׍♳־׷ꥷח%%-׾ⱄ䏝崧ׁזֻג׮葺ֻ ז׷ Ø %#⩎ך4FDVSJUZ(SPVQ׾⥜姻ׅ׷䗳銲֮׶
  • 70. configuration property BQQIJWFXJUIDPOHKTPO namehive.optimize.s3.query/name valuetrue/value descriptionOptimize query on S3/description /property property namejavax.jdo.option.ConnectionURL/name valuejdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true/value descriptionJDBC connect string for a JDBC metastore/description /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.mysql.jdbc.Driver/value descriptionDriver class name for a JDBC metastore/description /property property namejavax.jdo.option.ConnectionUserName/name valueusername/value descriptionUsername to use against metastore database/description /property property namejavax.jdo.option.ConnectionPassword/name valuepassword/value descriptionPassword to use against metastore database/description /property /configuration
  • 71. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 72. Ø 1ZUIPOغحثⳢ椚ⰻד.3׾饯⹛׃׋ְ✲׮֮׷ Ø ׮׃ֻכ$FMFSZך5BTLה׃ג饯⹛׃׋ְהַ Ø ׉ְֲ׏׋㜥さחכ1ZUIPOך⚥ַ׵.3׾⢪ֲ✲ ׮〳腉 Ø CPUPFNS׾ⵃ欽ׅ׷ Ø BXTDMJⰻַ׵⤑ⵃז6UJMJUZ׾《׏גֹג⢪ֲך׮ ֮׶ַ׮
  • 73. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 74. # -­‐*-­‐ coding: utf-­‐8 -­‐*-­‐ from datetime import datetime from boto.emr import connect_to_region from boto.emr.step import InstallHiveStep def setup_emr(): # need to export AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY # as environment variables. conn = connect_to_region('ap-­‐northeast-­‐1') install_step = InstallHiveStep(hive_versions='0.11.0.2') jobid = conn.run_jobflow( name='Create EMR [{}]'.format(datetime.today().strftime('%Y%m%d')), log_uri='s3://yourbucket/jobflow_logs/', ec2_keyname='your_key', master_instance_type='m1.medium', slave_instance_type='m1.medium', num_instances=3, action_on_failure='TERMINATE_JOB_FLOW', keep_alive=True, enable_debugging=False, hadoop_version='2.4.0', steps=[install_step], bootstrap_actions=[], instance_groups=None, additional_info=None, ami_version='3.1.1', api_params=None, visible_to_all_users=True, job_flow_role=None) return jobid if __name__ == '__main__': jobflow_id = setup_emr() print JobFlowID: {} started..format(jobflow_id)
  • 75. Ø 84ךؙٖرٝءٍٕכا٦أⰻחⰅ׸זְ✲ • 橆㞮㢌侧חⰅ׸׷׮װ׭׋倯ָ葺ְ • ٗ٦ٕؕوءٝדذأز׃׋ְ㜥さכ䊺׬搀׃ַ • .3׾甧׍♳־׷$ח➰♷ׅ׷*.3PMFדⵖ䖴
  • 76. GSPN UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU YFDVUF )JWF2- VTF .3
  • 77. jobid ꞿֻז׏ג׃ת׏׋ךדꨜ㔲孡׌ֽ = conn.run_jobflow( name='Create EMR and Exec hiveql [{}]'.format(target_date), log_uri='s3://{}/jobflow_logs/'.format(bucket_name), ec2_keyname='your_key', master_instance_type='m1.medium', slave_instance_type='m1.medium', num_instances=3, action_on_failure='TERMINATE_JOB_FLOW', keep_alive=True, enable_debugging=False, hadoop_version='2.4.0', steps=[install_step], bootstrap_actions=[], instance_groups=None, additional_info=None, ami_version='3.1.1', api_params=None, visible_to_all_users=True, job_flow_role=None) query_files = ['sample01.hql', 'sample02.hql'] hql_steps = [] for query_file in query_files: hql_step = HiveStep( name='Executing Query [{}]'.format(query_file), hive_file='s3n://{0}/hive-­‐script/{1}'.format( bucket_name, query_file), hive_versions=hive_version, hive_args=['-­‐dTARGET_DATE={0}'.format(target_date), '-­‐dBUCKET_NAME={0}'.format(bucket_name)]) hql_steps.append(hql_step) conn.add_jobflow_steps(jobid, hql_steps)
  • 78. VTF UPEPUIFGPMMPXJOH BXTDMJ YFDVUF )JWF2- YFDVUF TEJTUDQ $POH :PVS.3 #PPUTUSQ 1SFTUP $SFBUF $MVTUFS .FUBTUS $POH 1ZUIPO 4DSJQU $SFBUF $MVTUFS +PC'MPX .HNOU GSPN YFDVUF )JWF2- .3
  • 79. Ø غحثⳢ椚ח⣛㶷ꟼ⤘׾⡲׶׋ְ • ָ穄׻׏׋׵#ה$ず儗ח㹋遤ׅ׷ծ瘝 • ה#ָ穄׻׏׋׵$׾㹋遤ׅ׷ծ瘝 Ø 饯⹛儗꟦ך盖椚׾׮׏ה䩛鯪ח遤ְ׋ְ
  • 80. • IUUQTHJUIVCDPNTQPUJGZMVJHJ • 1ZUIPO醡ךػ؎فٓ؎ٝ盖椚ؿٖ٦يٙ٦ؙ • )BEPPQ4USFBNJOH׾ⵃ欽׃׋.BQ3FEVDFָ知⽃ח剅ֽ׷堣圓֮׶ • 1ZUIPOך؝٦س׌ֽד⣛㶷䚍鍑寸 • ⣛㶷䚍〳鋔⻉ ⴽ؟٦ؽأה׃ג甧׍♳־ • ⣛㶷䚍〳鋔⻉خ٦ٕכ钠鏾瘝稢ְַ堣腉כ搀ְ • )JWF2-ך㹋遤ח㼎䘔׃גְ׷ • 1JHך㹋遤ח㼎䘔׃גְ׷ • 4ך乼⡲ח㼎䘔׃ג׷ • 植朐׌הؔ٦غ٦ٕؗ
  • 81. • 盖椚歗꬗כ%KBOHP׾ⵃ欽 • ず♧ך؟٦غדDFMFSZהDFMFSZCFBU׾饯⹛ • EKBOHPDFMFSZ׾ⵃ欽׃ג暴㹀ةأؙ׾暴㹀ך儗꟦חُؗ٦חⰅ׸׷״ ֲח鏣㹀 • DFMFSZCFBUָُؗ٦חⰅ׏׋ةأؙ׾䭪׏ג㹋遤׃גֻ׸׷ • EKBOHPDFMFSZזֻג׮DFMFSZה%KBOHPכ鸬䵿דֹ׷ֽוծֿךأ؛ آُ٦ٕ堣腉ָ⤑ⵃזךדת׌⢪׏ג׷
  • 82. 3FGFSFODFT Ø IUUQTHJUIVCDPNBXTBXTDMJ • 劤㹺ך项俱הا٦أ Ø IUUQTHJUIVCDPNCPUPCPUP • 劤㹺ך项俱הا٦أ
  • 87. Ø ⯓鹈ꆃ刑傈儗挿ד遤ֻ׵ְך.BSLEPXO Ø 4MJEFMFTTח䮋䨌׃״ֲה׃׋ Ø 爡ⰻדٖؽُ٦⠓㹋倵
  • 89. Ø ⴱ׭ג䪮遭禸ך涪邌׃׋ Ø ➬✲דװ׏גֹ׋✲׾תה׭׷ְְ堣⠓ Ø ➭ך倯׋׍ָ➬✲׃ג׷儗ח罋ִגְ׷✲׾濼׶׋ְ Ø ➭ך⠓爡ך圓䧭ָז׈׉ך圓䧭׾ה׏גְ׷ךַ濼׶׋ְ
  • 90. (PBM 涺ׁ׿ך鑧׾耀ֹ׋ְ 荈ⴓָ Ø չז׈׉ך圓䧭זךַպח搊挿׾縧ְ׋✲⢽ךⰟ剣 Ø չⰅꟌ⟃♳պ׾湡䭷׃׋1ZUIPO