2. Scheduling on Large Clusters
● Goals
o High Utilization
o Honor User Defined constraints
o Maintain High Efficiency
● Issues
o Un-predictable load
o Varying types of load
o Increasing load and cluster size
3. Types of Schedulers
● Monolithic
o Single Resource Manager and Scheduler
o Google Borg
● Two Level
o Single Resource Management and multiple schedulers
o Mesos, Hadoop-on-Demand (HOD project)
● Shared state
o Multiple schedulers with access to all resources
o Google Omega
4. Monolithic Schedulers
● Stable, been around since 1990s
● Issues
o Head of line blocking
o Scalability is limited
o Popular with HPC community
Maui -> Moab(R), Platform LSF (IBM)
o Multi Path scheduling addresses some of these problems
5. Statically Partitioned Schedulers
● Common with Hadoop deployments
o Assumes full control of resources
o Dedicated or statically partitioned clusters
● Issues
o Low utilization
o Data fragmentation
6. ● AKA: Two-level Schedulers
o Resource Manager dynamically partitions a cluster
o Resources presented to partitions as “offers”
o Partitions request resources as needed
o e.g. Mesos and Hadoop on Demand (HOD)
● Issues
o Pessimistic locking is used during allocation
o Not suitable for “long running” jobs
o Gang scheduling (e.g. MPI jobs) can cause deadlocks
o Each scheduler has no idea about any other scheduler
Pre-emption is tricky
Dynamic Schedulers
7. ● What type of scheduler is Hadoop YARN?
o App Master requests single RM, per job
o But, the App Master provides job-mgmt service, not scheduling
o Effectively, its a Monolithic Scheduler
Trivia
8. ● No external Resource Manager
● Each scheduler has full access to cluster
● A copy of the cluster state is at each scheduler
● Optimistic concurrency control
o Updates are made atomically in a transaction
o Only one commit will succeed
o Failed transactions will try again
● Gang scheduling, will not result in resource hoarding
Shared State Schedulers
9. ● Each scheduler, free to choose a policy
● Requires a common understanding of
o Resources
o Precedence
● Relies on post-facto enforcement
● Results in high utilization and efficiency
Shared State Schedulers
Users can ask for colocation or ask for a particular rack or machine
Efficiency is : Fast allocation
Works well with small jobs (<<cluster resources) and short lived jobs that give up resources frequently
Works well with small jobs (<<cluster resources) and short lived jobs that give up resources frequently
* Addresses two issues of the two-level scheduler approach
– limited parallelism due to pessimistic concurrency control
- restricted visibility of resources in a scheduler framework
- no head-of-line blocking
* Potential cost of redoing work when the optimistic concurrency assumptions are incorrect
* Resource Hoarding not possible in an all-or-nothing resource allocation
* To prevent starvation: Incremental transactions == accept all but conflicting txns