Scaling Python Workloads with Ray on Kubernetes

01.08.2025 — ray, ml — 6 min read

Introduction: Why Ray?

As machine learning, reinforcement learning, and large-scale data processing evolve, scaling Python workloads beyond a single machine has become essential. Python's rich ecosystem is a boon — but its Global Interpreter Lock (GIL) and single-node design limit parallelism.

Existing distributed computing frameworks serve some use cases but fall short for modern ML workloads:

MPI offers fine-grained, low-latency control but is complex and often tied to static HPC environments.
Apache Spark excels at batch big data processing but is less suited to dynamic, GPU-intensive, or actor-style ML tasks.
Dask provides Pythonic APIs for parallelism but struggles with dynamic actor-based workloads or serving models.

Ray was created to fill this niche — providing a unified, flexible, Python-first framework for distributed applications that demand concurrency, stateful actors, and dynamic task graphs. It empowers researchers and engineers to write parallel and distributed Python programs with minimal friction, scaling seamlessly from laptops to thousands of nodes in Kubernetes or cloud environments.

Ray Architecture Diagram

This diagram illustrates the core components of a Ray cluster, including the Head Node, Worker Nodes, Raylets, Object Stores, Worker Processes, and the Global Control Store (GCS).

Step 1: Ray Core Concepts — Try First, Learn Deeply

Task-Based Distributed Execution

Start by installing Ray locally:

pip install ray

Try running a simple remote task:

import ray

ray.init()  # Initializes Ray locally

@ray.remote
def square(x):
    return x * x

futures = [square.remote(i) for i in range(5)]
results = ray.get(futures)
print(results)  # Output: [0, 1, 4, 9, 16]

What happened here?

@ray.remote decorates a Python function, turning it into a remote task callable asynchronously.
Calling .remote() schedules the task on the Ray cluster (local in this case).
ray.get() fetches results from Ray's distributed object store.

The Distributed Object Store

Ray's in-memory object store (based on Apache Arrow Plasma) allows zero-copy sharing of data between tasks across nodes. This avoids expensive data serialization and movement — key for performance.

Stateful Actors: Long-Lived Stateful Workers

Some computations need to keep internal state — for example, maintaining counters or holding trained ML models.

Try this example:

@ray.remote
class Counter:
    def __init__(self):
        self.count = 0

    def increment(self):
        self.count += 1
        return self.count

counter = Counter.remote()
print(ray.get(counter.increment.remote()))  # 1
print(ray.get(counter.increment.remote()))  # 2

Actors are stateful Python objects living inside the Ray cluster, communicating via remote method calls. This abstraction enables building services, simulations, and iterative algorithms naturally.

Task Dependencies and Scheduling

Ray supports composing complex asynchronous task graphs by passing object references:

@ray.remote
def add(x, y):
    return x + y

a = add.remote(1, 2)
b = add.remote(3, 4)
c = add.remote(a, b)

print(ray.get(c))  # 10

The global scheduler builds a Directed Acyclic Graph (DAG) of tasks, scheduling tasks as dependencies complete, maximizing parallelism while respecting data dependencies.

Step 2: Ray Architecture — A Closer Look

Ray's architecture consists of:

Head Node: The cluster's control plane, hosting the Global Control Store (GCS) managing metadata, cluster state, and the global scheduler.
Worker Nodes: Run Ray tasks and maintain partitions of the distributed object store.
Global Scheduler: Assigns tasks to nodes optimizing for load balancing and data locality.
Local Scheduler: Runs on each node, picks which worker executes a task.
Raylet: Node-level Ray process managing resources and task scheduling.
Object Store: Shared in-memory object store enabling zero-copy data sharing across processes and nodes.

Ray's architecture enables low-latency scheduling and fault tolerance by reassigning tasks if nodes fail, relying on lineage-based object reconstruction.

Step 3: Deploying Ray on Kubernetes

Why Kubernetes?

Kubernetes (K8s) is the de facto platform for cloud-native, scalable, and resilient distributed applications. It provides:

Declarative infrastructure management with YAML manifests
Automated scaling, scheduling, and self-healing
Integration with container runtimes and cloud providers

Deploying Ray on Kubernetes combines Ray's flexibility with Kubernetes' orchestration power, unlocking production-grade ML infrastructure.

Setting Up Kubernetes Locally

For development and testing, start with kind:

kind create cluster --name ray-cluster
kubectl get nodes

Installing KubeRay Operator

KubeRay automates Ray cluster lifecycle management on Kubernetes.

kubectl create namespace ray-system

helm repo add kuberay https://ray-project.github.io/kuberay-helm/

helm install kuberay kuberay/kuberay-operator

kubectl get pods -n ray-system  # Confirm operator is running

Defining a Ray Cluster YAML

Here's a minimal RayCluster custom resource definition (CRD):

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: ray-cluster
spec:
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
      dashboard-host: "0.0.0.0"
  workerGroupSpecs:
    - replicas: 3
      rayStartParams:
        num-cpus: "4"

Deploy the cluster:

kubectl apply -f ray-cluster.yaml
kubectl get pods  # Check for Ray head and worker pods

Access Ray Dashboard

Forward the Ray dashboard port to your local machine:

kubectl port-forward svc/ray-cluster-head-svc 8265:8265

Open your browser at http://localhost:8265 to monitor cluster health, task timelines, and resource usage.

Step 4: Running Distributed Ray Jobs on Kubernetes

Create a Docker image with your Ray application, for example:

Dockerfile:

FROM python:3.9-slim
RUN pip install ray
COPY my_ray_script.py /app/my_ray_script.py
WORKDIR /app
ENTRYPOINT ["python", "my_ray_script.py"]

Build and push your image:

docker build -t your-dockerhub-username/ray-job:latest .
docker push your-dockerhub-username/ray-job:latest

Define a RayJob resource to run your script:

apiVersion: ray.io/v1alpha1
kind: RayJob
metadata:
  name: example-ray-job
spec:
  entrypoint: python /app/my_ray_script.py
  rayCluster:
    name: ray-cluster
  podTemplate:
    spec:
      containers:
      - name: ray-worker
        image: your-dockerhub-username/ray-job:latest

Submit the job:

kubectl apply -f ray-job.yaml
kubectl logs -f job/example-ray-job-worker

Step 5: Connecting from Your Local Python Client

You can connect to your Kubernetes Ray cluster remotely:

import ray

ray.init(address="auto")  # Connects to Ray cluster based on environment

@ray.remote
def greet():
    return "Hello from K8s Ray Cluster!"

print(ray.get(greet.remote()))

Make sure to expose your Ray cluster head properly, e.g., via kubectl port-forward or LoadBalancer.

Step 6: Scaling and Resource Awareness

Kubernetes and KubeRay handle:

Auto-scaling: Adjust worker replicas in workerGroupSpecs.
Resource allocation: Specify CPU and GPU resources via rayStartParams:

workerGroupSpecs:
  - replicas: 2
    rayStartParams:
      num-cpus: "8"
      num-gpus: "1"

Ray's scheduler places tasks on appropriate nodes, optimizing resource utilization.

Step 7: Exploring Ray's Ecosystem Libraries

Ray's ecosystem extends functionality with:

Ray Tune — Distributed hyperparameter tuning.
Ray Serve — Scalable model serving.
RLlib — Reinforcement learning library.
Ray Data — Distributed data loading and ETL.

These libraries integrate seamlessly with Ray clusters on Kubernetes, enabling full ML lifecycle workflows.

Distributed Systems Theory Behind Ray

The actor model supports asynchronous message-passing and mutable distributed state.
Scheduling balances load and data locality to minimize latency.
The shared-nothing object store and lineage tracking provide fault tolerance through deterministic object reconstruction.
Ray balances CAP trade-offs with high availability and eventual consistency.

Conclusion

In this post, you have:

Explored Ray's core programming model with tasks, actors, and object store.
Examined Ray's distributed architecture in detail.
Set up a Kubernetes cluster and deployed Ray via KubeRay.
Learned to run distributed jobs and connect Python clients remotely.
Understood resource-aware scaling and integration with Kubernetes.

Ray empowers scaling Python workloads for modern ML and distributed computing — from your laptop to production Kubernetes clusters.