Kubernetes Helm Deployment
The MLflow Helm chart provides a production-ready way to deploy MLflow on any Kubernetes cluster.
The chart ships with:
- MLflow tracking server with configurable CLI options
- TLS support via an existing Kubernetes Secret
- Persistent storage with a PersistentVolumeClaim for SQLite or file-based artifact stores
- Ingress for external access
- Prometheus metrics and optional ServiceMonitor for the Prometheus Operator
- NetworkPolicy restricting ingress and egress to required ports
- RBAC with namespace-scoped and cluster-scoped rules
- Garbage collection via an optional CronJob that runs
mlflow gc
Prerequisites
- Kubernetes 1.23+
- Helm 3.8+
kubectlconfigured to point at your cluster- Download the MLflow Helm chart to your current working directory.
Quick Start
The simplest path — no external database or object store required. MLflow stores metadata in SQLite and artifacts on a PersistentVolumeClaim.
1. Install the chart
helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
--set storage.enabled=true \
--set mlflow.backendStoreUri="sqlite:////mlflow/mlflow.db" \
--set mlflow.artifactsDestination="/mlflow/artifacts"
2. Wait for the pod to become ready
kubectl get pods -n mlflow -w
You should see:
NAME READY STATUS RESTARTS AGE
mlflow-mlflow-xxxxxxxxxx-xxxxx 1/1 Running 0 30s
3. Access the UI
kubectl port-forward -n mlflow svc/mlflow-mlflow 5000:5000
Open http://localhost:5000 in your browser.
SQLite and local file storage are not suitable for multi-user or high-concurrency deployments. For production, use a PostgreSQL backend and a cloud object store. See Production Deployment below.
Production Deployment
Backend Store (PostgreSQL)
Store the database URI in a Kubernetes Secret to avoid exposing credentials in values files:
kubectl create secret generic mlflow-db-secret \
--namespace mlflow \
--from-literal=uri="postgresql://user:password@postgres:5432/mlflow"
Reference the Secret in your values file:
mlflow:
backendStoreUriFrom:
secretKeyRef:
name: mlflow-db-secret
key: uri
Artifact Store (S3)
kubectl create secret generic s3-credentials \
--namespace mlflow \
--from-literal=access-key-id=<key> \
--from-literal=secret-access-key=<secret>
mlflow:
artifactsDestination: "s3://my-bucket/mlflow"
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-credentials
key: secret-access-key
For GCS or Azure Blob Storage, supply the equivalent credentials and URIs (gs://... or wasbs://...).
TLS
Create a TLS Secret from your certificate and key:
kubectl create secret tls mlflow-tls \
--namespace mlflow \
--cert=tls.crt \
--key=tls.key
Enable TLS in your values file:
tls:
enabled: true
secretName: mlflow-tls
Ingress
MLflow's host-validation middleware only allows localhost and private-IP hosts by default.
When exposing MLflow through an Ingress with a public hostname, set allowed_hosts to match that hostname — otherwise requests are rejected with HTTP 403.
server:
value_options:
allowed_hosts: "mlflow.example.com"
ingress:
enabled: true
className: nginx
hosts:
- host: mlflow.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: mlflow-tls
hosts:
- mlflow.example.com
Deploy with your values:
helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
-f my-values.yaml
Workspaces
Workspaces partition experiments, registered models, prompts, and artifacts across teams on a shared server. Enabling workspaces requires a SQL database backend (PostgreSQL, MySQL, or MSSQL — not a file-based store) and a defaultArtifactRoot.
server:
flag_options:
- enable_workspaces
When garbageCollection is enabled, set allWorkspaces: true so the GC job cleans up soft-deleted resources across every workspace instead of only the default one:
garbageCollection:
enabled: true
schedule: "0 2 * * 0"
allWorkspaces: true
See Getting Started with Workspaces for workspace creation and client configuration.
Basic HTTP Auth
MLflow's built-in basic-auth plugin requires a CSRF secret key. Store it in a Kubernetes Secret:
kubectl create secret generic mlflow-auth-secret \
--namespace mlflow \
--from-literal=secret-key="$(openssl rand -hex 32)"
Reference the secret and enable the plugin in your values file:
server:
value_options:
app_name: "basic-auth"
env:
- name: MLFLOW_FLASK_SERVER_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlflow-auth-secret
key: secret-key
See Basic HTTP Auth for user and permission management.
Configuration Reference
Persistent Local Storage
Enable a PersistentVolumeClaim for SQLite or local file artifact storage:
storage:
enabled: true
size: 10Gi
storageClassName: "gp2" # match your cluster (e.g. "standard" for kind/minikube)
mlflow:
backendStoreUri: "sqlite:////mlflow/mlflow.db"
artifactsDestination: "/mlflow/artifacts"
Prometheus Metrics
metrics:
enabled: true
path: /metrics
serviceMonitor:
enabled: true # requires Prometheus Operator
Garbage Collection
Periodically remove soft-deleted runs, experiments, and their artifacts:
garbageCollection:
enabled: true
schedule: "0 2 * * 0" # weekly at 2 AM on Sunday
olderThan: "30d" # only remove resources soft-deleted for 30+ days
Resource Limits
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
Server Options
Pass any mlflow server CLI flag via server.value_options (key/value) or server.flag_options (bare flags):
server:
value_options:
host: "0.0.0.0"
port: 5000
workers: 4
flag_options: []
# - no_serve_artifacts
Full Example: PostgreSQL + S3
The repository ships a production-oriented example at charts/example-mlflow-charts.yaml.
helm install mlflow ./charts \
--namespace mlflow \
--create-namespace \
-f charts/example-mlflow-charts.yaml
Upgrade
helm upgrade mlflow ./charts \
--namespace mlflow \
-f my-values.yaml
Uninstall
helm uninstall mlflow --namespace mlflow
helm uninstall does not remove PersistentVolumeClaims. If storage.enabled=true, delete the PVC manually after uninstalling:
kubectl delete pvc -n mlflow --all
Connecting the MLflow Client
Once the server is running (via port-forward or Ingress), point the MLflow client at it:
import mlflow
mlflow.set_tracking_uri("http://localhost:5000") # or your Ingress hostname
with mlflow.start_run():
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)