Tracing in Production

Machine learning projects don't conclude with their initial launch. Ongoing monitoring and incremental enhancements are critical for long-term success. MLflow Tracing offers observability for your production application, supporting the iterative process of continuous improvement.

When it comes to production monitoring, there are many additional considerations for tracing, such as scalability, security, and cost. Moreover, monitoring the machine learning model alone never be sufficient in the real software operation. Engineering teams need to monitor multiple services throughout the entire product and ensure that it delivers expected values to end users. Thereby, MLflow supports integration with various observability solutions such as Grafana, Datadog, New Relic, and more, with the power of OpenTelemetry standardization.

Self-host Tracking Server

Of course, you can keep using the MLflow tracking server to store production traces. However, tracking server is optimized for offline experience and generally not suitable for handling the hyper scale traffic. Thereby, we recommend using the other two options for production monitoring use case.

If you choose to keep using the tracking server in production, we strongly recommend using SQL-based tracking server on top of a scalable database and artifact storage, as it will be a key factor for write and query performance. Refer to the tracking server setup guide for more details. In addition, tracking server by default uses infinite retention date for trace data, hence it is recommended to set up periodical deletion job using the SDK or REST API.

OpenTelemetry Collector

Traces generated by MLflow are compatible with the OpenTelemetry trace specs. Therefore, MLflow traces can be exported to various observability solutions that support OpenTelemetry.

By default, MLflow exports traces to the MLflow Tracking Server. To enable exporting traces to an OpenTelemetry Collector, set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable (or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) to the target URL of the OpenTelemetry Collector before starting any trace.

import mlflow
import os

# Set the endpoint of the OpenTelemetry Collector
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"
# Optionally, set the service name to group traces
os.environ["OTEL_SERVICE_NAME"] = "<your-service-name>"

# Trace will be exported to the OTel collector at http://localhost:4317/v1/traces
with mlflow.start_span(name="foo") as span:
    span.set_inputs({"a": 1})
    span.set_outputs({"b": 2})

warning

MLflow only exports traces to a single destination. When the OTEL_EXPORTER_OTLP_ENDPOINT environment variable is configured, MLflow will not export traces to the MLflow Tracking Server and you will not see traces in the MLflow UI.

Similarly, if you deploy the model to the Databricks Model Serving with tracing enabled, using the OpenTelemetry Collector will result in traces not being recorded in the Inference Table.

Click on the following icons to learn more about how to set up OpenTelemetry Collector for your specific observability platform.

Configurations

MLflow uses the standard OTLP Exporter for exporting traces to OpenTelemetry Collector instances. Thereby, you can use all of the configurations supported by OpenTelemetry. The following example configures the OTLP Exporter to use HTTP protocol instead of the default gRPC and sets custom headers:

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317/v1/traces"
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_TRACES_HEADERS="api_key=12345"

Databricks Managed MLflow

If you are using Databricks Managed MLflow for building your machine learning models, the set up for production tracing is easy. When deploying a model to a Databricks Model Serving endpoint:

check the "Enable Tracing" checkbox or manually specify ENABLE_MLFLOW_TRACING environment to True.
Enable inference table for the serving endpoint.

By doing so, MLflow exports the traces to the table for the serving endpoint along with the request and response logs.

Since the inference table is a Delta Table governed by Unity Catalog, it offers extremely high scalability and fine-grained access control, making it an ideal place to store your trace data securely. Refer to the Databricks MLflow Tracing documentation for more details about this setup.

Self-host Tracking Server​

OpenTelemetry Collector​

Configurations​

Databricks Managed MLflow​

Self-host Tracking Server

OpenTelemetry Collector

Configurations

Databricks Managed MLflow