Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos

Scheduling on Large
Clusters
Based on Google’s Omega Paper
Sameer Tiwari
Hadoop Architect, Pivotal Inc.
stiwari@gopivotal.com, @sameertech

Scheduling on Large Clusters
● Goals
o High Utilization
o Honor User Defined constraints
o Maintain High Efficiency
● Issues
o Un-predictable load
o Varying types of load
o Increasing load and cluster size

Types of Schedulers
● Monolithic
o Single Resource Manager and Scheduler
o Google Borg
● Two Level
o Single Resource Management and multiple schedulers
o Mesos, Hadoop-on-Demand (HOD project)
● Shared state
o Multiple schedulers with access to all resources
o Google Omega

Monolithic Schedulers
● Stable, been around since 1990s
● Issues
o Head of line blocking
o Scalability is limited
o Popular with HPC community
 Maui -> Moab(R), Platform LSF (IBM)
o Multi Path scheduling addresses some of these problems

Statically Partitioned Schedulers
● Common with Hadoop deployments
o Assumes full control of resources
o Dedicated or statically partitioned clusters
● Issues
o Low utilization
o Data fragmentation

● AKA: Two-level Schedulers
o Resource Manager dynamically partitions a cluster
o Resources presented to partitions as “offers”
o Partitions request resources as needed
o e.g. Mesos and Hadoop on Demand (HOD)
● Issues
o Pessimistic locking is used during allocation
o Not suitable for “long running” jobs
o Gang scheduling (e.g. MPI jobs) can cause deadlocks
o Each scheduler has no idea about any other scheduler
 Pre-emption is tricky
Dynamic Schedulers

● What type of scheduler is Hadoop YARN?
o App Master requests single RM, per job
o But, the App Master provides job-mgmt service, not scheduling
o Effectively, its a Monolithic Scheduler
Trivia

● No external Resource Manager
● Each scheduler has full access to cluster
● A copy of the cluster state is at each scheduler
● Optimistic concurrency control
o Updates are made atomically in a transaction
o Only one commit will succeed
o Failed transactions will try again
● Gang scheduling, will not result in resource hoarding
Shared State Schedulers

● Each scheduler, free to choose a policy
● Requires a common understanding of
o Resources
o Precedence
● Relies on post-facto enforcement
● Results in high utilization and efficiency
Shared State Schedulers

Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos

More Related Content

Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos

Editor's Notes