A Kubernetes-based scheduler, built for AI, manages everything from huge distributed computing workloads to smaller inference jobs. It ensures that every workload gets the compute resources it needs at the right time.
Multiple queues and a sophisticated fairness algorithm automatically queue, preempt, restart and run workloads based on pre-defined policies, priorities and resource availability.
Ensure that resources will be available to data scientists, projects or departments when needed. By enabling researchers to run experiments without thinking about the underlying infrastructure at all.
Allow workloads run through completion and free up resources upon completion, making the system much more efficient.
Allow workloads to be launched together, start together, recover from failures together, and end together.
Go over set quotas and allow workloads to grow and shrink automatically based on the resource availability.