Scaling Python Workloads with Ray on Kubernetes
Introduction: Why Ray?
As machine learning, reinforcement learning, and large-scale data processing evolve, scaling Python workloads beyond a single machine has become essential. Python's rich ecosystem is a boon โ but its Global Interpreter Lock (GIL) and single-node design limit parallelism.
Existing distributed computing frameworks serve some use cases but fall short for modern ML workloads:
- MPI offers fine-grained, low-latency control but is complex and often tied to static HPC environments.
- Apache Spark excels at batch big data processing but is less suited to dynamic, GPU-intensive, or actor-style ML tasks.
- Dask provides Pythonic APIs for parallelism but struggles with dynamic actor-based workloads or serving models.
Ray was created to fill this niche โ providing a unified, flexible, Python-first framework for distributed applications that demand concurrency, stateful actors, and dynamic task graphs. It empowers researchers and engineers to write parallel and distributed Python programs with minimal friction, scaling seamlessly from laptops to thousands of nodes in Kubernetes or cloud environments.
Ray Architecture Diagram
This diagram illustrates the core components of a Ray cluster, including the Head Node, Worker Nodes, Raylets, Object Stores, Worker Processes, and the Global Control Store (GCS).

Step 1: Ray Core Concepts โ Try First, Learn Deeply
Task-Based Distributed Execution
Start by installing Ray locally:
pip install ray
Try running a simple remote task:
import ray
ray.init() # Initializes Ray locally
@ray.remotedef square(x): return x * x
futures = [square.remote(i) for i in range(5)]results = ray.get(futures)print(results) # Output: [0, 1, 4, 9, 16]
What happened here?
@ray.remote
decorates a Python function, turning it into a remote task callable asynchronously.- Calling
.remote()
schedules the task on the Ray cluster (local in this case). ray.get()
fetches results from Ray's distributed object store.
The Distributed Object Store
Ray's in-memory object store (based on Apache Arrow Plasma) allows zero-copy sharing of data between tasks across nodes. This avoids expensive data serialization and movement โ key for performance.
Stateful Actors: Long-Lived Stateful Workers
Some computations need to keep internal state โ for example, maintaining counters or holding trained ML models.
Try this example:
@ray.remoteclass Counter: def __init__(self): self.count = 0
def increment(self): self.count += 1 return self.count
counter = Counter.remote()print(ray.get(counter.increment.remote())) # 1print(ray.get(counter.increment.remote())) # 2
Actors are stateful Python objects living inside the Ray cluster, communicating via remote method calls. This abstraction enables building services, simulations, and iterative algorithms naturally.
Task Dependencies and Scheduling
Ray supports composing complex asynchronous task graphs by passing object references:
@ray.remotedef add(x, y): return x + y
a = add.remote(1, 2)b = add.remote(3, 4)c = add.remote(a, b)
print(ray.get(c)) # 10
The global scheduler builds a Directed Acyclic Graph (DAG) of tasks, scheduling tasks as dependencies complete, maximizing parallelism while respecting data dependencies.
Step 2: Ray Architecture โ A Closer Look
Ray's architecture consists of:
- Head Node: The cluster's control plane, hosting the Global Control Store (GCS) managing metadata, cluster state, and the global scheduler.
- Worker Nodes: Run Ray tasks and maintain partitions of the distributed object store.
- Global Scheduler: Assigns tasks to nodes optimizing for load balancing and data locality.
- Local Scheduler: Runs on each node, picks which worker executes a task.
- Raylet: Node-level Ray process managing resources and task scheduling.
- Object Store: Shared in-memory object store enabling zero-copy data sharing across processes and nodes.
Ray's architecture enables low-latency scheduling and fault tolerance by reassigning tasks if nodes fail, relying on lineage-based object reconstruction.
Step 3: Deploying Ray on Kubernetes
Why Kubernetes?
Kubernetes (K8s) is the de facto platform for cloud-native, scalable, and resilient distributed applications. It provides:
- Declarative infrastructure management with YAML manifests
- Automated scaling, scheduling, and self-healing
- Integration with container runtimes and cloud providers
Deploying Ray on Kubernetes combines Ray's flexibility with Kubernetes' orchestration power, unlocking production-grade ML infrastructure.
Setting Up Kubernetes Locally
For development and testing, start with kind:
kind create cluster --name ray-clusterkubectl get nodes
Installing KubeRay Operator
KubeRay automates Ray cluster lifecycle management on Kubernetes.
kubectl create namespace ray-system
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay kuberay/kuberay-operator
kubectl get pods -n ray-system # Confirm operator is running
Defining a Ray Cluster YAML
Here's a minimal RayCluster custom resource definition (CRD):
apiVersion: ray.io/v1kind: RayClustermetadata: name: ray-clusterspec: headGroupSpec: serviceType: ClusterIP rayStartParams: dashboard-host: "0.0.0.0" workerGroupSpecs: - replicas: 3 rayStartParams: num-cpus: "4"
Deploy the cluster:
kubectl apply -f ray-cluster.yamlkubectl get pods # Check for Ray head and worker pods
Access Ray Dashboard
Forward the Ray dashboard port to your local machine:
kubectl port-forward svc/ray-cluster-head-svc 8265:8265
Open your browser at http://localhost:8265
to monitor cluster health, task timelines, and resource usage.
Step 4: Running Distributed Ray Jobs on Kubernetes
Create a Docker image with your Ray application, for example:
Dockerfile:
FROM python:3.9-slimRUN pip install rayCOPY my_ray_script.py /app/my_ray_script.pyWORKDIR /appENTRYPOINT ["python", "my_ray_script.py"]
Build and push your image:
docker build -t your-dockerhub-username/ray-job:latest .docker push your-dockerhub-username/ray-job:latest
Define a RayJob resource to run your script:
apiVersion: ray.io/v1alpha1kind: RayJobmetadata: name: example-ray-jobspec: entrypoint: python /app/my_ray_script.py rayCluster: name: ray-cluster podTemplate: spec: containers: - name: ray-worker image: your-dockerhub-username/ray-job:latest
Submit the job:
kubectl apply -f ray-job.yamlkubectl logs -f job/example-ray-job-worker
Step 5: Connecting from Your Local Python Client
You can connect to your Kubernetes Ray cluster remotely:
import ray
ray.init(address="auto") # Connects to Ray cluster based on environment
@ray.remotedef greet(): return "Hello from K8s Ray Cluster!"
print(ray.get(greet.remote()))
Make sure to expose your Ray cluster head properly, e.g., via kubectl port-forward
or LoadBalancer.
Step 6: Scaling and Resource Awareness
Kubernetes and KubeRay handle:
- Auto-scaling: Adjust worker replicas in
workerGroupSpecs
. - Resource allocation: Specify CPU and GPU resources via
rayStartParams
:
workerGroupSpecs: - replicas: 2 rayStartParams: num-cpus: "8" num-gpus: "1"
Ray's scheduler places tasks on appropriate nodes, optimizing resource utilization.
Step 7: Exploring Ray's Ecosystem Libraries
Ray's ecosystem extends functionality with:
- Ray Tune โ Distributed hyperparameter tuning.
- Ray Serve โ Scalable model serving.
- RLlib โ Reinforcement learning library.
- Ray Data โ Distributed data loading and ETL.
These libraries integrate seamlessly with Ray clusters on Kubernetes, enabling full ML lifecycle workflows.
Distributed Systems Theory Behind Ray
- The actor model supports asynchronous message-passing and mutable distributed state.
- Scheduling balances load and data locality to minimize latency.
- The shared-nothing object store and lineage tracking provide fault tolerance through deterministic object reconstruction.
- Ray balances CAP trade-offs with high availability and eventual consistency.
Conclusion
In this post, you have:
- Explored Ray's core programming model with tasks, actors, and object store.
- Examined Ray's distributed architecture in detail.
- Set up a Kubernetes cluster and deployed Ray via KubeRay.
- Learned to run distributed jobs and connect Python clients remotely.
- Understood resource-aware scaling and integration with Kubernetes.
Ray empowers scaling Python workloads for modern ML and distributed computing โ from your laptop to production Kubernetes clusters.