Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to Dev&Ops Internal PaaS
Search
taichi nakashima
June 29, 2015
22
4.2k
How to Dev&Ops Internal PaaS
Talked at
http://www.zusaar.com/event/9057007
taichi nakashima
June 29, 2015
Tweet
Share
More Decks by taichi nakashima
See All by taichi nakashima
Platform Engineering at Mercari (Platform Engineering Kaigi 2024)
tcnksm
5
3.6k
Platform Engineering at Mercari
tcnksm
8
4.8k
Embedded SRE at Mercari
tcnksm
0
1.4k
How We Harden Platform Security at Mercari
tcnksm
2
1.7k
SRE Practices in Mercari Microservices
tcnksm
11
12k
開発者向けの基盤をつくる
tcnksm
38
12k
How We Structure Our Work At Mercari Microservices Platform Team
tcnksm
11
23k
Microservices Platform on Kubernetes at Mercari
tcnksm
16
16k
Introduction to Mercari Micorservices Platform Team
tcnksm
5
3.5k
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
335
57k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
111
49k
Making the Leap to Tech Lead
cromwellryan
133
9k
Become a Pro
speakerdeck
PRO
25
5k
Gamification - CAS2011
davidbonilla
80
5.1k
KATA
mclloyd
29
14k
Site-Speed That Sticks
csswizardry
1
160
Building an army of robots
kneath
302
44k
How to train your dragon (web standard)
notwaldorf
88
5.7k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.5k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
1
150
Transcript
HOW TO DEV&OPS INTERNAL PAAS
TAICHI NAKASHIMA @deeeet @tcnksm
INTERNAL PAAS? = PaaS for Rakuten engineers
ONLY FOR TEST? = No. It receives production requests
WHY PAAS? = Fast app experimentation and iteration with PROD-grade
WHY PAAS? = You don’t need to prepare servers by
yourself
WHY PAAS? = You don’t need to provision servers by
yourself
WHY PAAS? = You don’t need to prepare DBs by
yourself
WHY PAAS? = You can scale your app by *one
command*
WHY PAAS? = You can focus on development, not deployment
WHY INTERNAL PAAS? = Easy to connect with other internal
service
WHY INTERNAL PAAS? = Instant support when something happen
WHY INTERNAL PAAS? (From organizational point of view) = You
can reduce duplicated tooling by different teams
HOW LARGE? How many request? servers? language?
16000 req/sec. All application requests
2500 instances 1400 (PROD) + 700 (STG) + 400 (DEV)
4300 VMs 2800 (PROD) + 1200 (STG) + 300 (DEV)
+300 VMs/mon. Growth forecasting
4 languages support Ruby, Node.js, Java, PHP
3 DB services Redis, MongoDB, Clustrix
100 Redis clusters 230 Instances
40 components Components (Roles) to run PaaS
320 chef recipes `ls cookbooks/*/recipes | wc -l`
8 Engineers Dev & Ops, From 7 Countries
HOW TO DEV&OPS INTERNAL PAAS
HOW TO DEV&OPS INTERNAL PAAS
None
Router API Health Check Messaging DBs Apps
DEV FLOW RELEASE FLOW
DEV FLOW RELEASE FLOW
Create Ticket on JIRA Write code Write Chef cookbook Test
on LAB Create PR (Git-Flow) Review
DEV FLOW RELEASE FLOW
Assign release manager Collect all JIRA tickets Write internal blog
CanaryRelease Release
1 release for 1 week DEV (2day) , STG (2day)
, PROD(3day)
HOW TO RELEASE? = Chef + Capistrano
RELEASE 1 SERVER
Service-out Run Chef solo Run Serverspec Service-in
Stop Load-Balancing Disable Health Check Stop monit Service-out Run Chef
solo Run Serverspec Service-in Start monit Enable Health Check Start Load-Balancing
/etc/service-out /etc/service-in Service-out Run Chef solo Run Serverspec Service-in
Every server has same startup/stop scripts = workflow is same
= automation is easy
RELEASE X SERVERS
cap service-in cap service-out cap setup-role Service-out X servers Run
Chef solo X servers Run Serverspec X servers Service-in X servers
Role A Role B Role C Operation 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA
170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST
cap service-out 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST
Operation Role A Role B Role C Parallel execution
cap setup-role Operation Parallel execution 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA
170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST Role A Role B Role C
cap service-in Role A Role B Role C Operation 170.20.20.21.RoleA
170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST Parallel execution
cap service-out Operation Parallel execution 170.20.20.31.RoleB 170.20.20.32.RoleB 170.20.20.33.RoleB 170.20.20.34.RoleB 170.20.20.35.RoleB
170.20.20.36.RoleB 170.20.20.37.RoleB VMLIST Role A Role B Role C
cap service-out 170.20.20.21.RoleA VMLIST Operation Role A Role B Role
C Start from Canary
HOW TO DEV&OPS INTERNAL PAAS
LOGGING MONITORING ALERT HANDLING SUPPORT IAAS
LOGGING MONITORING ALERT HANDLING SUPPORT IAAS
700GB/day logs All logs produced in PaaS
LOGGING IN PAAS? = Application logs + Component logs
APPLICATION LOG ? = PaaS should provide user the way
to debug
Instant logs Midterm logs Longterm logs Real time 1-2 weeks
- 6 month
Router API Health Check Messaging DBs Apps Instant log
Log Server Apps Object Storage Instant log Midterm log Longterm
log
Log Server Apps Instant log Midterm log Hadoop (BigData team)
Analytics
Log Server Apps Instant log Midterm log Splunk Dashboard
COMPONENT LOG ? = Log which we use for debug
PaaS itself
Log Server Object Storage
Log Server Object Storage We can debug CF here
Log Server Object Storage GlusterFS LeoFS
Log Server Object Storage GlusterFS
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
OpenTSDB, Pandra FMS
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
1 week, 24H charge Primary & Sub admin
✉
2500 ✉/day MAX. Need to fix…
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
JIRA, HipChat Instant support is one of *good* point of
Internal PaaS
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
IAAS Operating PaaS also means operating IaaS
vSphere
HOW TO BOOT SERVERS? = Internal tool like terraform
Role A vSphere Operation rvc create -c rvc.yml 170.20.21.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation VMLIST rvc create -c rvc.yml 170.20.21.RoleA
RoleA: cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation rvc create -c rvc.yml 170.20.21.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation rvc create -c rvc.yml 170.20.22.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation rvc create -c rvc.yml 170.20.23.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
cap setup-role Role A Operation vSphere 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
cap setup-role Role A Operation vSphere 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Easy to boot & setup servers = If there is
*physical resource*
FUTURE? = We are moving to *version 2*
BE GOPHER CloudFoundry moves from Ruby to Golang
NO FORK Everything goes to upstream
BE OPEN Building tool as OSS
✉
NO MORE TOO MUCH ✉ Planing to use Pagerduty +
Riemann
Log Server Object Storage GlusterFS LeoFS
Object Storage LeoFS Kafka
MORE FLEXIBLE LOG STACK Planning to use Apache Kafka
NEW METRICS STACK Planning to use InfluxDB + Grafana
CONTAINER Planning to support Docker
MORE HA Planning to have a ChaosMonkey
NEW IAAS Migrating to OpenStack
NEW IAAS Planning to Hybrid Cloud
WE HAVE MANY CHALLENGES
WE ARE HIRING http://corp.rakuten.co.jp/careers/experienced/
@deeeet