Distributed Computing with Python and Ray

Quick Tutorial

What Is Distributed Computing with Python?

Distributed computing with Python is the use of the Python programming language to execute computations in a distributed manner across multiple machines or processors. This allows for the execution of complex tasks that would be too time-consuming or impossible to execute on a single machine.

In the process of distributed computing, a problem is divided into smaller subproblems. These subproblems are then solved concurrently on different machines or processors. The results of these subproblems are aggregated to form the final solution. This approach leverages the computational power and resources of multiple machines, resulting in faster execution and completion of complex tasks.

Distributed computing with Python is particularly beneficial in big data analytics and machine learning applications, where large volumes of data need to be processed quickly and efficiently. Python's simplicity and readability make it an excellent choice for distributed computing. Python's syntax is straightforward, making it easier for developers to write and understand the code. Additionally, Python provides several libraries and tools that simplify the implementation of distributed computing. These include multiprocessing, threading, concurrent.futures, and more recently, Ray, which we will delve into later in this article.

In this article:

Why Choose Python for Distributed Computing

Choosing Python for distributed computing offers various benefits. Python's simplicity makes it a great choice for beginners. The language's straightforward syntax makes it easier to implement distributed computing algorithms and parallelize tasks.

Python also offers numerous libraries and tools for distributed computing. Libraries such as numpy and pandas provide powerful data manipulation capabilities, while tools like Dask and Ray offer high-level interfaces for distributed and parallel computing. With these resources at their disposal, developers can easily implement distributed computing solutions in Python.

Finally, Python boasts a large and active community. This means developers can find a wealth of resources, tutorials, and support to help them implement distributed computing solutions. Additionally, due to its popularity, Python and its libraries receive frequent updates and improvements.

What is Ray?

Ray is a high-performance distributed computing library for Python. It provides a simple, universal API for building distributed applications. Ray’s key capabilities include:

  • Task parallelism: Allows for the execution of multiple tasks concurrently, improving the overall efficiency and speed of computation. With Ray's task parallelism, developers can easily parallelize their Python code without the need to deal with low-level details.
  • Distributed computing: Ray allows for the execution of tasks across multiple machines, leveraging the computational power of all available resources. This makes it possible to perform large-scale computations that can’t be executed on a single machine.
  • Remote function execution: Allows developers to execute functions on remote machines, making it easier to build distributed computing systems.
  • Distributed data processing: Makes it possible to process large volumes of data across multiple machines, supporting large datasets in use cases like big data analytics and machine learning.
  • Reinforcement learning support: Reinforcement learning is a type of machine learning that involves training an agent to make decisions based on its interactions with an environment. Ray offers several libraries for reinforcement learning, making it easier for developers to implement reinforcement learning solutions in their applications.

Quick Tutorial: Writing a Distributed Python Application with Ray

Before you begin, install Ray by running:

pip install ray

Or in Python 3:

pip3 install ray

Turning Python Functions into Remote Functions (Ray Tasks)

The first step in writing a distributed Python application with Ray is to transform your Python functions into remote functions, also known as Ray tasks. This is done using the @ray.remote decorator. Here's a simple example:

import ray
ray.init()

@ray.remote
def hello_world():
    return "Hello, world!"
    
result_id = hello_world.remote()
print(ray.get(result_id))

In this code, we first import the Ray library and initialize it. Then, we define a function hello_world, which we turn into a remote function using the @ray.remote decorator. To call this remote function, we use the .remote() method. This method returns a future, which is a reference to the eventual result. To retrieve the result, we call ray.get() on the future.

There are a few important points to note here. First, when we call the remote function with .remote(), the function gets queued for execution and the method immediately returns a future. This means that the function call is non-blocking; the program can continue to execute other tasks while the remote function is running.

Comparing Local vs Remote Performance

Note: For this section, please ensure you use a system with at least 16 GB of RAM and Swap Memory turned on (on a Linux system).

Now that we've seen how to turn Python functions into remote functions with Ray, let's compare the performance of local and remote functions.

Let's consider a simple task: summing a large list of numbers. Here's how you might do this with a local function:

import time

def sum_list(numbers):
    return sum(numbers)

start_time = time.time()
numbers = list(range(100000000))
print(sum_list(numbers))
end_time = time.time()
print("Time taken: ", end_time - start_time)

And here's how you might do the same task with a remote function:

import ray
import time

ray.init()

@ray.remote
def sum_list(numbers):
    return sum(numbers)

start_time = time.time()
numbers = list(range(100000000))
result_id = sum_list.remote(numbers)
print(ray.get(result_id))
end_time = time.time()
print("Time taken: ", end_time - start_time)

In both cases, we print the sum and the time taken to compute the sum. If you run these two examples, you'll likely notice that the remote version is significantly faster. This is because Ray can execute the remote function in parallel with the rest of the program, taking full advantage of multicore processors.

Remote Objects as Actors

Another key aspect of distributed computing with Python using Ray is the use of remote objects, also known as actors. Actors are essentially remote objects that maintain state across multiple tasks.

An actor is created by defining a class and decorating it with @ray.remote. Here's an example:

import ray
ray.init()

@ray.remote
class Counter():
    def __init__(self):
        self.count = 0

    def increment(self):
        self.count += 1
        return self.count

counter = Counter.remote()
print(ray.get(counter.increment.remote()))

In this code, we first initialize Ray. Then, we define a Counter class and decorate it with @ray.remote. The Counter class has an __init__ method that initializes its count to 0 and an increment method that increments its count.

To create an actor, we call Counter.remote(). This returns a handle to the actor, which we can use to invoke its methods with .remote(). In this case, we call counter.increment.remote(), which increments the actor's count and returns the new count. Finally, we retrieve and print the count with ray.get().

Actors are a powerful tool for managing state in a distributed setting. They can be used to implement a variety of patterns, such as parameter servers for distributed machine learning, databases, and much more.

Distributed Computing and Run:ai

Run:ai has built Atlas, an AI computing platform, that functions as a foundational layer within the MLOps and AI Infrastructure stack. The automated resource management capabilities allow organizations to properly align the resources across the different MLOps platforms and tools running on top of Run:ai Atlas. Deep integration with the NVIDIA ecosystem through Run:ai's GPU Abstraction technology maximizes the utilization and value of the NVIDIA GPUs in your environment.

Learn more about Run:ai