Skip to main content

Kubernetes Helm Deployment

The MLflow Helm chart provides a production-ready way to deploy MLflow on any Kubernetes cluster.

The chart ships with:

  • MLflow tracking server with configurable CLI options
  • TLS support via an existing Kubernetes Secret
  • Persistent storage with a PersistentVolumeClaim for SQLite or file-based artifact stores
  • Ingress for external access
  • Prometheus metrics and optional ServiceMonitor for the Prometheus Operator
  • NetworkPolicy restricting ingress and egress to required ports
  • RBAC with namespace-scoped and cluster-scoped rules
  • Garbage collection via an optional CronJob that runs mlflow gc

Prerequisites

  • Kubernetes 1.23+
  • Helm 3.8+
  • kubectl configured to point at your cluster
  • Download the MLflow Helm chart to your current working directory.

Quick Start

The simplest path — no external database or object store required. MLflow stores metadata in SQLite and artifacts on a PersistentVolumeClaim.

1. Install the chart

bash
helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
--set storage.enabled=true \
--set mlflow.backendStoreUri="sqlite:////mlflow/mlflow.db" \
--set mlflow.artifactsDestination="/mlflow/artifacts"

2. Wait for the pod to become ready

bash
kubectl get pods -n mlflow -w

You should see:

text
NAME READY STATUS RESTARTS AGE
mlflow-mlflow-xxxxxxxxxx-xxxxx 1/1 Running 0 30s

3. Access the UI

bash
kubectl port-forward -n mlflow svc/mlflow-mlflow 5000:5000

Open http://localhost:5000 in your browser.

SQLite is not production-safe

SQLite and local file storage are not suitable for multi-user or high-concurrency deployments. For production, use a PostgreSQL backend and a cloud object store. See Production Deployment below.

Production Deployment

Backend Store (PostgreSQL)

Store the database URI in a Kubernetes Secret to avoid exposing credentials in values files:

bash
kubectl create secret generic mlflow-db-secret \
--namespace mlflow \
--from-literal=uri="postgresql://user:password@postgres:5432/mlflow"

Reference the Secret in your values file:

my-values.yaml
yaml
mlflow:
backendStoreUriFrom:
secretKeyRef:
name: mlflow-db-secret
key: uri

Artifact Store (S3)

bash
kubectl create secret generic s3-credentials \
--namespace mlflow \
--from-literal=access-key-id=<key> \
--from-literal=secret-access-key=<secret>
my-values.yaml
yaml
mlflow:
artifactsDestination: "s3://my-bucket/mlflow"

env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-credentials
key: secret-access-key

For GCS or Azure Blob Storage, supply the equivalent credentials and URIs (gs://... or wasbs://...).

TLS

Create a TLS Secret from your certificate and key:

bash
kubectl create secret tls mlflow-tls \
--namespace mlflow \
--cert=tls.crt \
--key=tls.key

Enable TLS in your values file:

my-values.yaml
yaml
tls:
enabled: true
secretName: mlflow-tls

Ingress

MLflow's host-validation middleware only allows localhost and private-IP hosts by default. When exposing MLflow through an Ingress with a public hostname, set allowed_hosts to match that hostname — otherwise requests are rejected with HTTP 403.

my-values.yaml
yaml
server:
value_options:
allowed_hosts: "mlflow.example.com"

ingress:
enabled: true
className: nginx
hosts:
- host: mlflow.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: mlflow-tls
hosts:
- mlflow.example.com

Deploy with your values:

bash
helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
-f my-values.yaml

Workspaces

Workspaces partition experiments, registered models, prompts, and artifacts across teams on a shared server. Enabling workspaces requires a SQL database backend (PostgreSQL, MySQL, or MSSQL — not a file-based store) and a defaultArtifactRoot.

my-values.yaml
yaml
server:
flag_options:
- enable_workspaces

When garbageCollection is enabled, set allWorkspaces: true so the GC job cleans up soft-deleted resources across every workspace instead of only the default one:

my-values.yaml
yaml
garbageCollection:
enabled: true
schedule: "0 2 * * 0"
allWorkspaces: true

See Getting Started with Workspaces for workspace creation and client configuration.

Basic HTTP Auth

MLflow's built-in basic-auth plugin requires a CSRF secret key. Store it in a Kubernetes Secret:

bash
kubectl create secret generic mlflow-auth-secret \
--namespace mlflow \
--from-literal=secret-key="$(openssl rand -hex 32)"

Reference the secret and enable the plugin in your values file:

my-values.yaml
yaml
server:
value_options:
app_name: "basic-auth"

env:
- name: MLFLOW_FLASK_SERVER_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlflow-auth-secret
key: secret-key

See Basic HTTP Auth for user and permission management.

Configuration Reference

Persistent Local Storage

Enable a PersistentVolumeClaim for SQLite or local file artifact storage:

yaml
storage:
enabled: true
size: 10Gi
storageClassName: "gp2" # match your cluster (e.g. "standard" for kind/minikube)

mlflow:
backendStoreUri: "sqlite:////mlflow/mlflow.db"
artifactsDestination: "/mlflow/artifacts"

Prometheus Metrics

yaml
metrics:
enabled: true
path: /metrics

serviceMonitor:
enabled: true # requires Prometheus Operator

Garbage Collection

Periodically remove soft-deleted runs, experiments, and their artifacts:

yaml
garbageCollection:
enabled: true
schedule: "0 2 * * 0" # weekly at 2 AM on Sunday
olderThan: "30d" # only remove resources soft-deleted for 30+ days

Resource Limits

yaml
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi

Server Options

Pass any mlflow server CLI flag via server.value_options (key/value) or server.flag_options (bare flags):

yaml
server:
value_options:
host: "0.0.0.0"
port: 5000
workers: 4

flag_options: []
# - no_serve_artifacts

Full Example: PostgreSQL + S3

The repository ships a production-oriented example at charts/example-mlflow-charts.yaml.

bash
helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
-f charts/example-mlflow-charts.yaml

Upgrade

bash
helm upgrade mlflow ./charts \
--namespace mlflow \
-f my-values.yaml

Uninstall

bash
helm uninstall mlflow --namespace mlflow
PersistentVolumeClaims are not deleted

helm uninstall does not remove PersistentVolumeClaims. If storage.enabled=true, delete the PVC manually after uninstalling:

bash
kubectl delete pvc -n mlflow --all

Connecting the MLflow Client

Once the server is running (via port-forward or Ingress), point the MLflow client at it:

python
import mlflow

mlflow.set_tracking_uri("http://localhost:5000") # or your Ingress hostname

with mlflow.start_run():
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)