The project’s main objective is cost optimization, which involves automatically locating the availability zone, area, and provider that is most affordable for the desired resources.
Skypilot, an open source framework for running machine learning workloads on the major cloud providers through a uniform interface, was been released by a group of researchers at the RISELab at UC Berkeley.
The system automatically decides which AWS, Azure, and Google Cloud regions have the resources (CPU/GPU/TPU) needed to run the project, as well as which are the most cheap. Skypilot then completes three primary tasks: it controls job queueing and execution, synchronises user code and files to the destination, and provisions the cluster with automatic failover to other sites if there are capacity or quota difficulties.
SkyPilot is not the RISELab’s first open source initiative aimed at reducing cloud costs. The research centre released SkyPlane, which speeds up and lowers the cost of transferring huge datasets between cloud providers.
The authors advise developing multi-cloud applications, utilising best-in-class hardware, and expanding the accessibility of limited resources like top-tier NVIDIA V100 or A100 GPUs, among other advantages of SkyPilot.
The framework has two features: Managed Spot, a way to use less expensive spot instances with automatic preemption recovery, and Autostop, a way to automatically clear up unused clusters. To aid developers in comprehending how the project functions, the team published a set of Jupyter notebooks.