Skip to content

IPAR99/KubeTube

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KubeTube: AI Inference Scheduler Simulation

Disclaimer: Made this on the train, used AI, my trainriding colleagues and lots of google!

I tried to simulate a local Kubernetes-based LLM inference gateway. It uses kind for the cluster, Volcano and Kueue for scheduling/queuing, and a Python-based mock scheduler to dispatch jobs to simulated GPU workers.

Apple Silicon Special Note

I had trouble running Kubernetes webhooks on my M1 macbook, the internet tells me that this occurs when the cluster blocks new Pods while waiting for a validator that hasn't started yet. This guide includes the specific bypasses needed to unblock the scheduler.

Prerequisites

  • macOS
  • Docker Desktop (Ensure Settings > General > "Use Rosetta for x86_64/amd64" is ON)
  • uv (Fast Python package manager)
  • Homebrew (brew install kind kubectl uvicorn)

Project Structure

  • k8s/: Kubernetes manifests for Volcano, Kueue, and worker deployments.
  • gateway/: FastAPI application acting as the inference entry point.
  • worker/: Python script simulating GPU metrics and job execution.
  • dashboard/: FastAPI/HTML dashboard for real-time visualization.
  • scripts/: Shell scripts for cluster lifecycle management.

Setup and Installation

  1. Initialize Python Environment I like uv :)

     uv venv
     source .venv/bin/activate
     uv pip install fastapi uvicorn httpx python-dotenv pydantic kubernetes jinja2
    
  2. Provision the Cluster This creates the kind cluster and installs Volcano/Kueue CRDs.

     chmod +x scripts/setup.sh scripts/teardown.sh
     ./scripts/setup.sh
    
  3. The "Apple Silicon" Unblock If your workers stay in Pending or ImagePullBackOff, run these commands to clear the admission webhook deadlock: Bash

Remove the "Traffic Cops" blocking the pods

    kubectl delete mutatingwebhookconfiguration kueue-mutating-webhook-configuration
    kubectl delete validatingwebhookconfiguration kueue-validating-webhook-configuration

Force restart the workers

    kubectl rollout restart deployment inference-workers

Execution

I made a start.sh but that includes everything I need to run it on a M1 Mac so maybe you need to leave some stuff out depending on what you are doing. So below I put all the steps :)

For easy running just:

chmod +x start.sh
./start.sh

To see the system in action, you need five active terminal processes. Ensure source .venv/bin/activate is run in each. Tabs 1-3: Establish Worker Bridges

These map the internal Kubernetes pods to your local machine so the Gateway can communicate with the simulated GPUs.

Tab 1

kubectl port-forward deployment/inference-workers 9001:9000

Tab 2

kubectl port-forward deployment/inference-workers 9002:9000

Tab 3

kubectl port-forward deployment/inference-workers 9003:9000

Tab 4: Start the Gateway

The Gateway handles job logic and selects the best worker based on real-time metrics.

export PYTHONPATH=$PYTHONPATH:$(pwd)/gateway
uvicorn gateway.main:app --port 8000

Tab 5: Start the Dashboard

The visual UI used to monitor GPU utilization and VRAM.

cd dashboard
uvicorn main:app --port 8001

Testing the System

Open your browser to http://localhost:8001. You should see three GPU cards in a "Live" or "Idle" state.

Submit a high-priority job via curl:

curl -X POST http://localhost:8000/jobs \
  -H "Content-Type: application/json" \
  -d '{
"prompt": "Synthesize a 4K cinematic video", 
"priority": "high"
  }'

Observation:

  • The Gateway logs will show the selection of the worker with the lowest GPU % load.
  • The Dashboard will show a real-time spike in GPU usage on the selected card.
  • A Volcano Job (vcjob) is created in the cluster to manage the batch lifecycle.

Scheduling Logic

This implementation uses a Mock Scheduler inside gateway/main.py. It retrieves the cluster state from the worker metrics endpoints and selects the worker with the lowest gpu_util percentage. Once selected, it triggers a vcjob (Volcano Job) in the cluster to simulate a batch workload.

Troubleshooting Lessons Learned

  • Pathing: Always run uvicorn from the relevant sub-folder (or use PYTHONPATH) so Python can find local modules like k8s_client.py.
  • Templates: Jinja2 expects a ./templates folder relative to where the uvicorn command is executed.
  • Webhook Deadlocks: On local clusters, external validators (Kueue) can prevent their own images from starting. Deleting the webhookconfiguration is the standard "break glass" fix for local development.

Teardown

To remove the cluster and all associated resources:

./scripts/teardown.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors