mlflow

The mlflow module provides a high-level “fluent” API for starting and managing MLflow runs. For example:

import mlflow

mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()

You can also use the context manager syntax like this:

with mlflow.start_run() as run:
    mlflow.log_param("my", "param")
    mlflow.log_metric("score", 100)

which automatically terminates the run at the end of the with block.

The fluent tracking API is not currently threadsafe. Any concurrent callers to the tracking API must implement mutual exclusion manually.

For a lower level API, see the mlflow.client module.

class mlflow.ActiveRun(run)[source]

Wrapper around mlflow.entities.Run to enable using Python with syntax.

mlflow.active_run()Optional[ActiveRun][source]

Get the currently active Run, or None if no such run exists.

Note: You cannot access currently-active run attributes (parameters, metrics, etc.) through the run returned by mlflow.active_run. In order to access such attributes, use the mlflow.client.MlflowClient as follows:

Example
import mlflow

mlflow.start_run()
run = mlflow.active_run()
print("Active run_id: {}".format(run.info.run_id))
mlflow.end_run()
Output
Active run_id: 6f252757005748708cd3aad75d1ff462
mlflow.autolog(log_input_examples: bool = False, log_model_signatures: bool = True, log_models: bool = True, disable: bool = False, exclusive: bool = False, disable_for_unsupported_versions: bool = False, silent: bool = False)None[source]

Enables (or disables) and configures autologging for all supported integrations.

The parameters are passed to any autologging integrations that support them.

See the tracking docs for a list of supported autologging integrations.

Note that framework-specific configurations set at any point will take precedence over any configurations set by this function. For example:

mlflow.autolog(log_models=False, exclusive=True)
import sklearn

would enable autologging for sklearn with log_models=False and exclusive=True, but

mlflow.autolog(log_models=False, exclusive=True)
import sklearn
mlflow.sklearn.autolog(log_models=True)

would enable autologging for sklearn with log_models=True and exclusive=False, the latter resulting from the default value for exclusive in mlflow.sklearn.autolog; other framework autolog functions (e.g. mlflow.tensorflow.autolog) would use the configurations set by mlflow.autolog (in this instance, log_models=False, exclusive=True), until they are explicitly called by the user.

Parameters
  • log_input_examples – If True, input examples from training datasets are collected and logged along with model artifacts during training. If False, input examples are not logged. Note: Input examples are MLflow model attributes and are only collected if log_models is also True.

  • log_model_signatures – If True, ModelSignatures describing model inputs and outputs are collected and logged along with model artifacts during training. If False, signatures are not logged. Note: Model signatures are MLflow model attributes and are only collected if log_models is also True.

  • log_models – If True, trained models are logged as MLflow model artifacts. If False, trained models are not logged. Input examples and model signatures, which are attributes of MLflow models, are also omitted when log_models is False.

  • disable – If True, disables all supported autologging integrations. If False, enables all supported autologging integrations.

  • exclusive – If True, autologged content is not logged to user-created fluent runs. If False, autologged content is logged to the active fluent run, which may be user-created.

  • disable_for_unsupported_versions – If True, disable autologging for versions of all integration libraries that have not been tested against this version of the MLflow client or are incompatible.

  • silent – If True, suppress all event logs and warnings from MLflow during autologging setup and training execution. If False, show all events and warnings during autologging setup and training execution.

Example
import numpy as np
import mlflow.sklearn
from mlflow import MlflowClient
from sklearn.linear_model import LinearRegression

def print_auto_logged_info(r):
    tags = {k: v for k, v in r.data.tags.items() if not k.startswith("mlflow.")}
    artifacts = [f.path for f in MlflowClient().list_artifacts(r.info.run_id, "model")]
    print("run_id: {}".format(r.info.run_id))
    print("artifacts: {}".format(artifacts))
    print("params: {}".format(r.data.params))
    print("metrics: {}".format(r.data.metrics))
    print("tags: {}".format(tags))

# prepare training data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3

# Auto log all the parameters, metrics, and artifacts
mlflow.autolog()
model = LinearRegression()
with mlflow.start_run() as run:
    model.fit(X, y)

# fetch the auto logged parameters and metrics for ended run
print_auto_logged_info(mlflow.get_run(run_id=run.info.run_id))
Output
run_id: fd10a17d028c47399a55ab8741721ef7
artifacts: ['model/MLmodel', 'model/conda.yaml', 'model/model.pkl']
params: {'copy_X': 'True',
         'normalize': 'False',
         'fit_intercept': 'True',
         'n_jobs': 'None'}
metrics: {'training_score': 1.0,
          'training_rmse': 4.440892098500626e-16,
          'training_r2_score': 1.0,
          'training_mae': 2.220446049250313e-16,
          'training_mse': 1.9721522630525295e-31}
tags: {'estimator_class': 'sklearn.linear_model._base.LinearRegression',
       'estimator_name': 'LinearRegression'}
mlflow.create_experiment(name: str, artifact_location: Optional[str] = None, tags: Optional[Dict[str, Any]] = None)str[source]

Create an experiment.

Parameters
  • name – The experiment name, which must be unique and is case sensitive

  • artifact_location – The location to store run artifacts. If not provided, the server picks an appropriate default.

  • tags – An optional dictionary of string keys and values to set as tags on the experiment.

Returns

String ID of the created experiment.

Example
import mlflow
from pathlib import Path

# Create an experiment name, which must be unique and case sensitive
experiment_id = mlflow.create_experiment(
    "Social NLP Experiments",
    artifact_location=Path.cwd().joinpath("mlruns").as_uri(),
    tags={"version": "v1", "priority": "P1"},
)
experiment = mlflow.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
Output
Name: Social NLP Experiments
Experiment_id: 1
Artifact Location: file:///.../mlruns
Tags: {'version': 'v1', 'priority': 'P1'}
Lifecycle_stage: active
mlflow.delete_experiment(experiment_id: str)None[source]

Delete an experiment from the backend store.

Parameters

experiment_id – The The string-ified experiment ID returned from create_experiment.

Example
import mlflow

experiment_id = mlflow.create_experiment("New Experiment")
mlflow.delete_experiment(experiment_id)

# Examine the deleted experiment details.
experiment = mlflow.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
Output
Name: New Experiment
Artifact Location: file:///.../mlruns/2
Lifecycle_stage: deleted
mlflow.delete_run(run_id: str)None[source]

Deletes a run with the given ID.

Parameters

run_id – Unique identifier for the run to delete.

Example
import mlflow

with mlflow.start_run() as run:
    mlflow.log_param("p", 0)

run_id = run.info.run_id
mlflow.delete_run(run_id)

print("run_id: {}; lifecycle_stage: {}".format(run_id,
    mlflow.get_run(run_id).info.lifecycle_stage))
Output
run_id: 45f4af3e6fd349e58579b27fcb0b8277; lifecycle_stage: deleted
mlflow.delete_tag(key: str)None[source]

Delete a tag from a run. This is irreversible. If no run is active, this method will create a new active run.

Parameters

key – Name of the tag

Example
import mlflow

tags = {"engineering": "ML Platform",
        "engineering_remote": "ML Platform"}

with mlflow.start_run() as run:
    mlflow.set_tags(tags)

with mlflow.start_run(run_id=run.info.run_id):
    mlflow.delete_tag("engineering_remote")
mlflow.end_run(status: str = 'FINISHED')None[source]

End an active MLflow run (if there is one).

Example
import mlflow

# Start run and get status
mlflow.start_run()
run = mlflow.active_run()
print("run_id: {}; status: {}".format(run.info.run_id, run.info.status))

# End run and get status
mlflow.end_run()
run = mlflow.get_run(run.info.run_id)
print("run_id: {}; status: {}".format(run.info.run_id, run.info.status))
print("--")

# Check for any active runs
print("Active run: {}".format(mlflow.active_run()))
Output
run_id: b47ee4563368419880b44ad8535f6371; status: RUNNING
run_id: b47ee4563368419880b44ad8535f6371; status: FINISHED
--
Active run: None
mlflow.evaluate(model: Union[str, mlflow.pyfunc.PyFuncModel], data, *, targets, model_type: str, dataset_name=None, dataset_path=None, feature_names: Optional[list] = None, evaluators=None, evaluator_config=None, custom_metrics=None)[source]

Note

Experimental: This method may change or be removed in a future release without warning.

Evaluate a PyFunc model on the specified dataset using one or more specified evaluators, and log resulting metrics & artifacts to MLflow Tracking. For additional overview information, see the Model Evaluation documentation.

Default Evaluator behavior:
  • The default evaluator, which can be invoked with evaluators="default" or evaluators=None, supports the "regressor" and "classifier" model types. It generates a variety of model performance metrics, model performance plots, and model explanations.

  • For both the "regressor" and "classifier" model types, the default evaluator generates model summary plots and feature importance plots using SHAP.

  • For regressor models, the default evaluator additionally logs:
    • metrics: example_count, mean_absolute_error, mean_squared_error, root_mean_squared_error, sum_on_label, mean_on_label, r2_score, max_error, mean_absolute_percentage_error.

  • For binary classifiers, the default evaluator additionally logs:
    • metrics: true_negatives, false_positives, false_negatives, true_positives, recall, precision, f1_score, accuracy, example_count, log_loss, roc_auc, precision_recall_auc.

    • artifacts: lift curve plot, precision-recall plot, ROC plot.

  • For multiclass classifiers, the default evaluator additionally logs:
    • metrics: accuracy, example_count, f1_score_micro, f1_score_macro, log_loss

    • artifacts: A CSV file for “per_class_metrics” (per-class metrics includes true_negatives/false_positives/false_negatives/true_positives/recall/precision/roc_auc, precision_recall_auc), precision-recall merged curves plot, ROC merged curves plot.

  • For sklearn models, the default evaluator additionally logs the model’s evaluation criterion (e.g. mean accuracy for a classifier) computed by model.score method.

  • The logged MLflow metric keys are constructed using the format: {metric_name}_on_{dataset_name}. Any preexisting metrics with the same name are overwritten.

  • The metrics/artifacts listed above are logged to the active MLflow run. If no active run exists, a new MLflow run is created for logging these metrics and artifacts.

  • Additionally, information about the specified dataset - hash, name (if specified), path (if specified), and the UUID of the model that evaluated it - is logged to the mlflow.datasets tag.

  • The available evaluator_config options for the default evaluator include:
    • log_model_explainability: A boolean value specifying whether or not to log model explainability insights, default value is True.

    • explainability_algorithm: A string to specify the SHAP Explainer algorithm for model explainability. Supported algorithm includes: ‘exact’, ‘permutation’, ‘partition’, ‘kernel’. If not set, shap.Explainer is used with the “auto” algorithm, which chooses the best Explainer based on the model.

    • explainability_nsamples: The number of sample rows to use for computing model explainability insights. Default value is 2000.

    • explainability_kernel_link: The kernel link function used by shap kernal explainer. Available values are “identity” and “logit”. Default value is “identity”.

    • max_classes_for_multiclass_roc_pr: For multiclass classification tasks, the maximum number of classes for which to log the per-class ROC curve and Precision-Recall curve. If the number of classes is larger than the configured maximum, these curves are not logged.

  • Limitations of evaluation dataset:
    • For classification tasks, dataset labels are used to infer the total number of classes.

    • For binary classification tasks, the negative label value must be 0 or -1 or False, and the positive label value must be 1 or True.

  • Limitations of metrics/artifacts computation:
    • For classification tasks, some metric and artifact computations require the model to output class probabilities. Currently, for scikit-learn models, the default evaluator calls the predict_proba method on the underlying model to obtain probabilities. For other model types, the default evaluator does not compute metrics/artifacts that require probability outputs.

  • Limitations of default evaluator logging model explainability insights:
    • The shap.Explainer auto algorithm uses the Linear explainer for linear models and the Tree explainer for tree models. Because SHAP’s Linear and Tree explainers do not support multi-class classification, the default evaluator falls back to using the Exact or Permutation explainers for multi-class classification tasks.

    • Logging model explainability insights is not currently supported for PySpark models.

    • The evaluation dataset label values must be numeric or boolean, all feature values must be numeric, and each feature column must only contain scalar values.

Parameters
  • model – A pyfunc model instance, or a URI referring to such a model.

  • data

    One of the following:

    • A numpy array or list of evaluation features, excluding labels.

    • A Pandas DataFrame or Spark DataFrame, containing evaluation features and labels. If feature_names argument not specified, all columns are regarded as feature columns. Otherwise, only column names present in feature_names are regarded as feature columns. If it is Spark DataFrame, only the first 10000 rows in the Spark DataFrame will be used as evaluation data.

  • targets – If data is a numpy array or list, a numpy array or list of evaluation labels. If data is a DataFrame, the string name of a column from data that contains evaluation labels.

  • model_type – A string describing the model type. The default evaluator supports "regressor" and "classifier" as model types.

  • dataset_name – (Optional) The name of the dataset, must not contain double quotes (). The name is logged to the mlflow.datasets tag for lineage tracking purposes. If not specified, the dataset hash is used as the dataset name.

  • dataset_path – (Optional) The path where the data is stored. Must not contain double quotes (). If specified, the path is logged to the mlflow.datasets tag for lineage tracking purposes.

  • feature_names – (Optional) If the data argument is a feature data numpy array or list, feature_names is a list of the feature names for each feature. If None, then the feature_names are generated using the format feature_{feature_index}. If the data argument is a Pandas DataFrame or a Spark DataFrame, feature_names is a list of the names of the feature columns in the DataFrame. If None, then all columns except the label column are regarded as feature columns.

  • evaluators – The name of the evaluator to use for model evaluation, or a list of evaluator names. If unspecified, all evaluators capable of evaluating the specified model on the specified dataset are used. The default evaluator can be referred to by the name "default". To see all available evaluators, call mlflow.models.list_evaluators().

  • evaluator_config – A dictionary of additional configurations to supply to the evaluator. If multiple evaluators are specified, each configuration should be supplied as a nested dictionary whose key is the evaluator name.

  • custom_metrics

    (Optional) A list of custom metric functions. A custom metric function is required to take in two parameters:

    • Union[pandas.Dataframe, pyspark.sql.DataFrame]: The first being a Pandas or Spark DataFrame containing prediction and target column. The prediction column contains the predictions made by the model. The target column contains the corresponding labels to the predictions made on that row.

    • Dict: The second is a dictionary containing the metrics calculated by the default evaluator. The keys are the names of the metrics and the values are the scalar values of the metrics. Refer to the DefaultEvaluator behavior section for what metrics will be returned based on the type of model (i.e. classifier or regressor).

    • (Optional) str: the path to a temporary directory that can be used by the custom metric function to temporarily store produced artifacts. The directory will be deleted after the artifacts are logged.

    A custom metric function can return in the following format:

    • Dict[AnyStr, Union[int, float, np.number]: a singular dictionary of custom metrics, where the keys are the names of the metrics, and the values are the scalar values of the metrics.

    • Tuple[Dict[AnyStr, Union[int,float,np.number]], Dict[AnyStr,Any]]: a tuple of a dict containing the custom metrics, and a dict of artifacts, where the keys are the names of the artifacts, and the values are objects representing the artifacts.

    Object types that artifacts can be represented as:

    • A string uri representing the file path to the artifact. MLflow will infer the type of the artifact based on the file extension.

    • A string representation of a JSON object. This will be saved as a .json artifact.

    • Pandas DataFrame. This will be resolved as a CSV artifact.

    • Numpy array. This will be saved as a .npy artifact.

    • Matplotlib Figure. This will be saved as an image artifact. Note that matplotlib.pyplot.savefig is called behind the scene with default configurations. To customize, either save the figure with the desired configurations and return its file path or define customizations through environment variables in matplotlib.rcParams.

    • Other objects will be attempted to be pickled with the default protocol.

    Custom Metric Function Boilerplate
    def custom_metrics_boilerplate(eval_df, builtin_metrics):
        # ...
        metrics: Dict[AnyStr, Union[int, float, np.number]] = some_dict
        artifacts: Dict[AnyStr, Any] = some_artifact_dict
        # ...
        if artifacts is not None:
            return metrics, artifacts
        return metrics
    
    Example usage of custom metrics
    def squared_diff_plus_one(eval_df, builtin_metrics):
        return {
            "squared_diff_plus_one": (
                np.sum(
                    np.abs(
                        eval_df["prediction"] - eval_df["target"] + 1
                    ) ** 2
                )
            )
        }
    
    def scatter_plot(eval_df, builtin_metrics, artifacts_dir):
        import tempfile
        plt.scatter(eval_df['prediction'], eval_df['target'])
        plt.xlabel('Targets')
        plt.ylabel('Predictions')
        plt.title("Targets vs. Predictions")
        plt.savefig(os.path.join(artifacts_dir, "example.png"))
        return {}, {
            "pred_target_scatter": os.path.join(
                 artifacts_dir, "example.png"
            )
        }
    
    with mlflow.start_run():
        mlflow.evaluate(
            model,
            data,
            targets,
            model_type,
            dataset_name,
            evaluators,
            custom_metrics=[squared_diff_plus_one, scatter_plot],
        )
    

Returns

An mlflow.models.EvaluationResult instance containing evaluation results.

mlflow.get_artifact_uri(artifact_path: Optional[str] = None)str[source]

Get the absolute URI of the specified artifact in the currently active run. If path is not specified, the artifact root URI of the currently active run will be returned; calls to log_artifact and log_artifacts write artifact(s) to subdirectories of the artifact root URI.

If no run is active, this method will create a new active run.

Parameters

artifact_path – The run-relative artifact path for which to obtain an absolute URI. For example, “path/to/artifact”. If unspecified, the artifact root URI for the currently active run will be returned.

Returns

An absolute URI referring to the specified artifact or the currently active run’s artifact root. For example, if an artifact path is provided and the currently active run uses an S3-backed store, this may be a uri of the form s3://<bucket_name>/path/to/artifact/root/path/to/artifact. If an artifact path is not provided and the currently active run uses an S3-backed store, this may be a URI of the form s3://<bucket_name>/path/to/artifact/root.

Example
import mlflow

features = "rooms, zipcode, median_price, school_rating, transport"
with open("features.txt", 'w') as f:
    f.write(features)

# Log the artifact in a directory "features" under the root artifact_uri/features
with mlflow.start_run():
    mlflow.log_artifact("features.txt", artifact_path="features")

    # Fetch the artifact uri root directory
    artifact_uri = mlflow.get_artifact_uri()
    print("Artifact uri: {}".format(artifact_uri))

    # Fetch a specific artifact uri
    artifact_uri = mlflow.get_artifact_uri(artifact_path="features/features.txt")
    print("Artifact uri: {}".format(artifact_uri))
Output
Artifact uri: file:///.../0/a46a80f1c9644bd8f4e5dd5553fffce/artifacts
Artifact uri: file:///.../0/a46a80f1c9644bd8f4e5dd5553fffce/artifacts/features/features.txt
mlflow.get_experiment(experiment_id: str)Experiment[source]

Retrieve an experiment by experiment_id from the backend store

Parameters

experiment_id – The string-ified experiment ID returned from create_experiment.

Returns

mlflow.entities.Experiment

Example
import mlflow

experiment = mlflow.get_experiment("0")
print("Name: {}".format(experiment.name))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
Output
Name: Default
Artifact Location: file:///.../mlruns/0
Tags: {}
Lifecycle_stage: active
mlflow.get_experiment_by_name(name: str)Optional[Experiment][source]

Retrieve an experiment by experiment name from the backend store

Parameters

name – The case sensitive experiment name.

Returns

An instance of mlflow.entities.Experiment if an experiment with the specified name exists, otherwise None.

Example
import mlflow

# Case sensitive name
experiment = mlflow.get_experiment_by_name("Default")
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
Output
Experiment_id: 0
Artifact Location: file:///.../mlruns/0
Tags: {}
Lifecycle_stage: active
mlflow.get_registry_uri()str[source]

Get the current registry URI. If none has been specified, defaults to the tracking URI.

Returns

The registry URI.

Example
# Get the current model registry uri
mr_uri = mlflow.get_registry_uri()
print("Current model registry uri: {}".format(mr_uri))

# Get the current tracking uri
tracking_uri = mlflow.get_tracking_uri()
print("Current tracking uri: {}".format(tracking_uri))

# They should be the same
assert mr_uri == tracking_uri
Output
Current model registry uri: file:///.../mlruns
Current tracking uri: file:///.../mlruns
mlflow.get_run(run_id: str)Run[source]

Fetch the run from backend store. The resulting Run contains a collection of run metadata – RunInfo, as well as a collection of run parameters, tags, and metrics – RunData. In the case where multiple metrics with the same key are logged for the run, the RunData contains the most recently logged value at the largest step for each metric.

Parameters

run_id – Unique identifier for the run.

Returns

A single mlflow.entities.Run object, if the run exists. Otherwise, raises an exception.

Example
import mlflow

with mlflow.start_run() as run:
    mlflow.log_param("p", 0)

run_id = run.info.run_id
print("run_id: {}; lifecycle_stage: {}".format(run_id,
    mlflow.get_run(run_id).info.lifecycle_stage))
Output
run_id: 7472befefc754e388e8e922824a0cca5; lifecycle_stage: active
mlflow.get_tracking_uri()str[source]

Get the current tracking URI. This may not correspond to the tracking URI of the currently active run, since the tracking URI can be updated via set_tracking_uri.

Returns

The tracking URI.

Example
import mlflow

# Get the current tracking uri
tracking_uri = mlflow.get_tracking_uri()
print("Current tracking uri: {}".format(tracking_uri))
Output
Current tracking uri: file:///.../mlruns
mlflow.is_tracking_uri_set()[source]

Returns True if the tracking URI has been set, False otherwise.

mlflow.last_active_run()Optional[Run][source]

Gets the most recent active run.

Examples:

To retrieve the most recent autologged run:
import mlflow

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = RandomForestRegressor(n_estimators = 100, max_depth = 6, max_features = 3)
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)
autolog_run = mlflow.last_active_run()
To get the most recently active run that ended:
import mlflow

mlflow.start_run()
mlflow.end_run()
run = mlflow.last_active_run()
To retrieve the currently active run:
import mlflow

mlflow.start_run()
run = mlflow.last_active_run()
mlflow.end_run()
Returns

The active run (this is equivalent to mlflow.active_run()) if one exists. Otherwise, the last run started from the current Python process that reached a terminal status (i.e. FINISHED, FAILED, or KILLED).

mlflow.list_experiments(view_type: int = 1, max_results: Optional[int] = None)List[Experiment][source]
Parameters
  • view_type – Qualify requested type of experiments.

  • max_results – If passed, specifies the maximum number of experiments desired. If not passed, all experiments will be returned.

Returns

A list of Experiment objects.

mlflow.list_run_infos(experiment_id: str, run_view_type: int = 1, max_results: int = 1000, order_by: Optional[List[str]] = None)List[RunInfo][source]

Return run information for runs which belong to the experiment_id.

Parameters
  • experiment_id – The experiment id which to search

  • run_view_type – ACTIVE_ONLY, DELETED_ONLY, or ALL runs

  • max_results – Maximum number of results desired.

  • order_by – List of order_by clauses. Currently supported values are are metric.key, parameter.key, tag.key, attribute.key. For example, order_by=["tag.release ASC", "metric.click_rate DESC"].

Returns

A list of RunInfo objects that satisfy the search expressions.

Example
import mlflow
from mlflow.entities import ViewType

# Create two runs
with mlflow.start_run() as run1:
    mlflow.log_param("p", 0)

with mlflow.start_run() as run2:
    mlflow.log_param("p", 1)

# Delete the last run
mlflow.delete_run(run2.info.run_id)

def print_run_infos(run_infos):
    for r in run_infos:
        print("- run_id: {}, lifecycle_stage: {}".format(r.run_id, r.lifecycle_stage))

print("Active runs:")
print_run_infos(mlflow.list_run_infos("0", run_view_type=ViewType.ACTIVE_ONLY))

print("Deleted runs:")
print_run_infos(mlflow.list_run_infos("0", run_view_type=ViewType.DELETED_ONLY))

print("All runs:")
print_run_infos(mlflow.list_run_infos("0", run_view_type=ViewType.ALL))
Output
Active runs:
- run_id: 4937823b730640d5bed9e3e5057a2b34, lifecycle_stage: active
Deleted runs:
- run_id: b13f1badbed842cf9975c023d23da300, lifecycle_stage: deleted
All runs:
- run_id: b13f1badbed842cf9975c023d23da300, lifecycle_stage: deleted
- run_id: 4937823b730640d5bed9e3e5057a2b34, lifecycle_stage: active
mlflow.log_artifact(local_path: str, artifact_path: Optional[str] = None)None[source]

Log a local file or directory as an artifact of the currently active run. If no run is active, this method will create a new active run.

Parameters
  • local_path – Path to the file to write.

  • artifact_path – If provided, the directory in artifact_uri to write to.

Example
import mlflow

# Create a features.txt artifact file
features = "rooms, zipcode, median_price, school_rating, transport"
with open("features.txt", 'w') as f:
    f.write(features)

# With artifact_path=None write features.txt under
# root artifact_uri/artifacts directory
with mlflow.start_run():
    mlflow.log_artifact("features.txt")
mlflow.log_artifacts(local_dir: str, artifact_path: Optional[str] = None)None[source]

Log all the contents of a local directory as artifacts of the run. If no run is active, this method will create a new active run.

Parameters
  • local_dir – Path to the directory of files to write.

  • artifact_path – If provided, the directory in artifact_uri to write to.

Example
import os
import mlflow

# Create some files to preserve as artifacts
features = "rooms, zipcode, median_price, school_rating, transport"
data = {"state": "TX", "Available": 25, "Type": "Detached"}

# Create couple of artifact files under the directory "data"
os.makedirs("data", exist_ok=True)
with open("data/data.json", 'w', encoding='utf-8') as f:
    json.dump(data, f, indent=2)
with open("data/features.txt", 'w') as f:
    f.write(features)

# Write all files in "data" to root artifact_uri/states
with mlflow.start_run():
    mlflow.log_artifacts("data", artifact_path="states")
mlflow.log_dict(dictionary: Any, artifact_file: str)None[source]

Log a JSON/YAML-serializable object (e.g. dict) as an artifact. The serialization format (JSON or YAML) is automatically inferred from the extension of artifact_file. If the file extension doesn’t exist or match any of [“.json”, “.yml”, “.yaml”], JSON format is used.

Parameters
  • dictionary – Dictionary to log.

  • artifact_file – The run-relative artifact file path in posixpath format to which the dictionary is saved (e.g. “dir/data.json”).

Example
import mlflow

dictionary = {"k": "v"}

with mlflow.start_run():
    # Log a dictionary as a JSON file under the run's root artifact directory
    mlflow.log_dict(dictionary, "data.json")

    # Log a dictionary as a YAML file in a subdirectory of the run's root artifact directory
    mlflow.log_dict(dictionary, "dir/data.yml")

    # If the file extension doesn't exist or match any of [".json", ".yaml", ".yml"],
    # JSON format is used.
    mlflow.log_dict(dictionary, "data")
    mlflow.log_dict(dictionary, "data.txt")
mlflow.log_figure(figure: Union[matplotlib.figure.Figure, plotly.graph_objects.Figure], artifact_file: str)None[source]

Log a figure as an artifact. The following figure objects are supported:

Parameters
  • figure – Figure to log.

  • artifact_file – The run-relative artifact file path in posixpath format to which the figure is saved (e.g. “dir/file.png”).

Matplotlib Example
import mlflow
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([0, 1], [2, 3])

with mlflow.start_run():
    mlflow.log_figure(fig, "figure.png")
Plotly Example
import mlflow
from plotly import graph_objects as go

fig = go.Figure(go.Scatter(x=[0, 1], y=[2, 3]))

with mlflow.start_run():
    mlflow.log_figure(fig, "figure.html")
mlflow.log_image(image: Union[numpy.ndarray, PIL.Image.Image], artifact_file: str)None[source]

Log an image as an artifact. The following image objects are supported:

Numpy array support
  • data type (( ) represents a valid value range):

    • bool

    • integer (0 ~ 255)

    • unsigned integer (0 ~ 255)

    • float (0.0 ~ 1.0)

    Warning

    • Out-of-range integer values will be clipped to [0, 255].

    • Out-of-range float values will be clipped to [0, 1].

  • shape (H: height, W: width):

    • H x W (Grayscale)

    • H x W x 1 (Grayscale)

    • H x W x 3 (an RGB channel order is assumed)

    • H x W x 4 (an RGBA channel order is assumed)

Parameters
  • image – Image to log.

  • artifact_file – The run-relative artifact file path in posixpath format to which the image is saved (e.g. “dir/image.png”).

Numpy Example
import mlflow
import numpy as np

image = np.random.randint(0, 256, size=(100, 100, 3), dtype=np.uint8)

with mlflow.start_run():
    mlflow.log_image(image, "image.png")
Pillow Example
import mlflow
from PIL import Image

image = Image.new("RGB", (100, 100))

with mlflow.start_run():
    mlflow.log_image(image, "image.png")
mlflow.log_metric(key: str, value: float, step: Optional[int] = None)None[source]

Log a metric under the current run. If no run is active, this method will create a new active run.

Parameters
  • key – Metric name (string). This string may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ), and slashes (/). All backend stores will support keys up to length 250, but some may support larger keys.

  • value – Metric value (float). Note that some special values such as +/- Infinity may be replaced by other values depending on the store. For example, the SQLAlchemy store replaces +/- Infinity with max / min float values. All backend stores will support values up to length 5000, but some may support larger values.

  • step – Metric step (int). Defaults to zero if unspecified.

Example
import mlflow

with mlflow.start_run():
    mlflow.log_metric("mse", 2500.00)
mlflow.log_metrics(metrics: Dict[str, float], step: Optional[int] = None)None[source]

Log multiple metrics for the current run. If no run is active, this method will create a new active run.

Parameters
  • metrics – Dictionary of metric_name: String -> value: Float. Note that some special values such as +/- Infinity may be replaced by other values depending on the store. For example, sql based store may replace +/- Infinity with max / min float values.

  • step – A single integer step at which to log the specified Metrics. If unspecified, each metric is logged at step zero.

Returns

None

Example
import mlflow

metrics = {"mse": 2500.00, "rmse": 50.00}

# Log a batch of metrics
with mlflow.start_run():
    mlflow.log_metrics(metrics)
mlflow.log_param(key: str, value: Any)None[source]

Log a parameter (e.g. model hyperparameter) under the current run. If no run is active, this method will create a new active run.

Parameters
  • key – Parameter name (string). This string may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ), and slashes (/). All backend stores support keys up to length 250, but some may support larger keys.

  • value – Parameter value (string, but will be string-ified if not). All backend stores support values up to length 500, but some may support larger values.

Example
import mlflow

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
mlflow.log_params(params: Dict[str, Any])None[source]

Log a batch of params for the current run. If no run is active, this method will create a new active run.

Parameters

params – Dictionary of param_name: String -> value: (String, but will be string-ified if not)

Returns

None

Example
import mlflow

params = {"learning_rate": 0.01, "n_estimators": 10}

# Log a batch of parameters
with mlflow.start_run():
    mlflow.log_params(params)
mlflow.log_text(text: str, artifact_file: str)None[source]

Log text as an artifact.

Parameters
  • text – String containing text to log.

  • artifact_file – The run-relative artifact file path in posixpath format to which the text is saved (e.g. “dir/file.txt”).

Example
import mlflow

with mlflow.start_run():
    # Log text to a file under the run's root artifact directory
    mlflow.log_text("text1", "file1.txt")

    # Log text in a subdirectory of the run's root artifact directory
    mlflow.log_text("text2", "dir/file2.txt")

    # Log HTML text
    mlflow.log_text("<h1>header</h1>", "index.html")
mlflow.register_model(model_uri, name, await_registration_for=300, *, tags: Optional[Dict[str, Any]] = None)ModelVersion[source]

Create a new model version in model registry for the model files specified by model_uri. Note that this method assumes the model registry backend URI is the same as that of the tracking backend.

Parameters
  • model_uri – URI referring to the MLmodel directory. Use a runs:/ URI if you want to record the run ID with the model in model registry. models:/ URIs are currently not supported.

  • name – Name of the registered model under which to create a new model version. If a registered model with the given name does not exist, it will be created automatically.

  • await_registration_for – Number of seconds to wait for the model version to finish being created and is in READY status. By default, the function waits for five minutes. Specify 0 or None to skip waiting.

  • tags – A dictionary of key-value pairs that are converted into mlflow.entities.model_registry.ModelVersionTag objects.

Returns

Single mlflow.entities.model_registry.ModelVersion object created by backend.

Example
import mlflow.sklearn
from sklearn.ensemble import RandomForestRegressor

mlflow.set_tracking_uri("sqlite:////tmp/mlruns.db")
params = {"n_estimators": 3, "random_state": 42}

# Log MLflow entities
with mlflow.start_run() as run:
   rfr = RandomForestRegressor(**params).fit([[0, 1]], [1])
   mlflow.log_params(params)
   mlflow.sklearn.log_model(rfr, artifact_path="sklearn-model")

model_uri = "runs:/{}/sklearn-model".format(run.info.run_id)
mv = mlflow.register_model(model_uri, "RandomForestRegressionModel")
print("Name: {}".format(mv.name))
print("Version: {}".format(mv.version))
Output
Name: RandomForestRegressionModel
Version: 1
mlflow.run(uri, entry_point='main', version=None, parameters=None, docker_args=None, experiment_name=None, experiment_id=None, backend='local', backend_config=None, use_conda=None, storage_dir=None, synchronous=True, run_id=None, run_name=None, env_manager=None)[source]

Run an MLflow project. The project can be local or stored at a Git URI.

MLflow provides built-in support for running projects locally or remotely on a Databricks or Kubernetes cluster. You can also run projects against other targets by installing an appropriate third-party plugin. See Community Plugins for more information.

For information on using this method in chained workflows, see Building Multistep Workflows.

Raises

mlflow.exceptions.ExecutionException If a run launched in blocking mode is unsuccessful.

Parameters
  • uri – URI of project to run. A local filesystem path or a Git repository URI (e.g. https://github.com/mlflow/mlflow-example) pointing to a project directory containing an MLproject file.

  • entry_point – Entry point to run within the project. If no entry point with the specified name is found, runs the project file entry_point as a script, using “python” to run .py files and the default shell (specified by environment variable $SHELL) to run .sh files.

  • version – For Git-based projects, either a commit hash or a branch name.

  • parameters – Parameters (dictionary) for the entry point command.

  • docker_args – Arguments (dictionary) for the docker command.

  • experiment_name – Name of experiment under which to launch the run.

  • experiment_id – ID of experiment under which to launch the run.

  • backend – Execution backend for the run: MLflow provides built-in support for “local”, “databricks”, and “kubernetes” (experimental) backends. If running against Databricks, will run against a Databricks workspace determined as follows: if a Databricks tracking URI of the form databricks://profile has been set (e.g. by setting the MLFLOW_TRACKING_URI environment variable), will run against the workspace specified by <profile>. Otherwise, runs against the workspace specified by the default Databricks CLI profile.

  • backend_config – A dictionary, or a path to a JSON file (must end in ‘.json’), which will be passed as config to the backend. The exact content which should be provided is different for each execution backend and is documented at https://www.mlflow.org/docs/latest/projects.html.

  • use_conda – This argument is deprecated. Use env_manager=’local’ instead. If True (the default), create a new Conda environment for the run and install project dependencies within that environment. Otherwise, run the project in the current environment without installing any project dependencies.

  • storage_dir – Used only if backend is “local”. MLflow downloads artifacts from distributed URIs passed to parameters of type path to subdirectories of storage_dir.

  • synchronous – Whether to block while waiting for a run to complete. Defaults to True. Note that if synchronous is False and backend is “local”, this method will return, but the current process will block when exiting until the local run completes. If the current process is interrupted, any asynchronous runs launched via this method will be terminated. If synchronous is True and the run fails, the current process will error out as well.

  • run_id – Note: this argument is used internally by the MLflow project APIs and should not be specified. If specified, the run ID will be used instead of creating a new run.

  • run_name – The name to give the MLflow Run associated with the project execution. If None, the MLflow Run name is left unset.

  • env_manager

    Specify an environment manager to create a new environment for the run and install project dependencies within that environment. The following values are supported:

    • local: use the local environment

    • conda: use conda

    • virtualenv: use virtualenv (and pyenv for Python version management)

    If unspecified, default to conda.

Returns

mlflow.projects.SubmittedRun exposing information (e.g. run ID) about the launched run.

Example
import mlflow

project_uri = "https://github.com/mlflow/mlflow-example"
params = {"alpha": 0.5, "l1_ratio": 0.01}

# Run MLflow project and create a reproducible conda environment
# on a local host
mlflow.run(project_uri, parameters=params)
Output
...
...
Elasticnet model (alpha=0.500000, l1_ratio=0.010000):
RMSE: 0.788347345611717
MAE: 0.6155576449938276
R2: 0.19729662005412607
... mlflow.projects: === Run (ID '6a5109febe5e4a549461e149590d0a7c') succeeded ===
mlflow.search_experiments(view_type: int = 1, max_results: Optional[int] = None, filter_string: Optional[str] = None, order_by: Optional[List[str]] = None)List[Experiment][source]

Note

Experimental: This method may change or be removed in a future release without warning.

Search for experiments that match the specified search query.

Parameters
  • view_type – One of enum values ACTIVE_ONLY, DELETED_ONLY, or ALL defined in mlflow.entities.ViewType.

  • max_results – If passed, specifies the maximum number of experiments desired. If not passed, all experiments will be returned.

  • filter_string

    Filter query string (e.g., "name = 'my_experiment'"), defaults to searching for all experiments. The following identifiers, comparators, and logical operators are supported.

    Identifiers
    • name: Experiment name.

    • tags.<tag_key>: Experiment tag. If tag_key contains spaces, it must be wrapped with backticks (e.g., "tags.`extra key`").

    Comparators
    • =: Equal to.

    • !=: Not equal to.

    • LIKE: Case-sensitive pattern match.

    • ILIKE: Case-insensitive pattern match.

    Logical operators
    • AND: Combines two sub-queries and returns True if both of them are True.

  • order_by

    List of columns to order by. The order_by column can contain an optional DESC or ASC value (e.g., "name DESC"). The default is ASC so "name" is equivalent to "name ASC". The following fields are supported.

    • name: Experiment name.

    • experiment_id: Experiment ID.

Returns

A list of Experiment objects.

Example
import mlflow


def assert_experiment_names_equal(experiments, expected_names):
    actual_names = [e.name for e in experiments if e.name != "Default"]
    assert actual_names == expected_names, (actual_names, expected_names)


mlflow.set_tracking_uri("sqlite:///:memory:")

# Create experiments
for name, tags in [
    ("a", None),
    ("b", None),
    ("ab", {"k": "v"}),
    ("bb", {"k": "V"}),
]:
    mlflow.create_experiment(name, tags=tags)

# Search for experiments with name "a"
experiments = mlflow.search_experiments(filter_string="name = 'a'")
assert_experiment_names_equal(experiments, ["a"])

# Search for experiments with name starting with "a"
experiments = mlflow.search_experiments(filter_string="name LIKE 'a%'")
assert_experiment_names_equal(experiments, ["ab", "a"])

# Search for experiments with tag key "k" and value ending with "v" or "V"
experiments = mlflow.search_experiments(filter_string="tags.k ILIKE '%v'")
assert_experiment_names_equal(experiments, ["bb", "ab"])

# Search for experiments with name ending with "b" and tag {"k": "v"}
experiments = mlflow.search_experiments(filter_string="name LIKE '%b' AND tags.k = 'v'")
assert_experiment_names_equal(experiments, ["ab"])

# Sort experiments by name in ascending order
experiments = mlflow.search_experiments(order_by=["name"])
assert_experiment_names_equal(experiments, ["a", "ab", "b", "bb"])

# Sort experiments by ID in descending order
experiments = mlflow.search_experiments(order_by=["experiment_id DESC"])
assert_experiment_names_equal(experiments, ["bb", "ab", "b", "a"])
mlflow.search_runs(experiment_ids: Optional[List[str]] = None, filter_string: str = '', run_view_type: int = 1, max_results: int = 100000, order_by: Optional[List[str]] = None, output_format: str = 'pandas', search_all_experiments: bool = False, experiment_names: Optional[List[str]] = None)Union[List[Run], pandas.DataFrame][source]

Get a pandas DataFrame of runs that fit the search criteria.

Parameters
  • experiment_ids – List of experiment IDs. Search can work with experiment IDs or experiment names, but not both in the same call. Values other than None or [] will result in error if experiment_names is also not None or []. None will default to the active experiment if experiment_names is None or [].

  • filter_string – Filter query string, defaults to searching all runs.

  • run_view_type – one of enum values ACTIVE_ONLY, DELETED_ONLY, or ALL runs defined in mlflow.entities.ViewType.

  • max_results – The maximum number of runs to put in the dataframe. Default is 100,000 to avoid causing out-of-memory issues on the user’s machine.

  • order_by – List of columns to order by (e.g., “metrics.rmse”). The order_by column can contain an optional DESC or ASC value. The default is ASC. The default ordering is to sort by start_time DESC, then run_id.

  • output_format – The output format to be returned. If pandas, a pandas.DataFrame is returned and, if list, a list of mlflow.entities.Run is returned.

  • search_all_experiments – Boolean specifying whether all experiments should be searched. Only honored if experiment_ids is [] or None.

  • experiment_names – List of experiment names. Search can work with experiment IDs or experiment names, but not both in the same call. Values other than None or [] will result in error if experiment_ids is also not None or []. None will default to the active experiment if experiment_ids is None or [].

Returns

If output_format is list: a list of mlflow.entities.Run. If output_format is pandas: pandas.DataFrame of runs, where each metric, parameter, and tag is expanded into its own column named metrics.*, params.*, or tags.* respectively. For runs that don’t have a particular metric, parameter, or tag, the value for the corresponding column is (NumPy) Nan, None, or None respectively.

Example
import mlflow

# Create an experiment and log two runs under it
experiment_name = "Social NLP Experiments"
experiment_id = mlflow.create_experiment(experiment_name)
with mlflow.start_run(experiment_id=experiment_id):
    mlflow.log_metric("m", 1.55)
    mlflow.set_tag("s.release", "1.1.0-RC")
with mlflow.start_run(experiment_id=experiment_id):
    mlflow.log_metric("m", 2.50)
    mlflow.set_tag("s.release", "1.2.0-GA")

# Search for all the runs in the experiment with the given experiment ID
df = mlflow.search_runs([experiment_id], order_by=["metrics.m DESC"])
print(df[["metrics.m", "tags.s.release", "run_id"]])
print("--")

# Search the experiment_id using a filter_string with tag
# that has a case insensitive pattern
filter_string = "tags.s.release ILIKE '%rc%'"
df = mlflow.search_runs([experiment_id], filter_string=filter_string)
print(df[["metrics.m", "tags.s.release", "run_id"]])
print("--")

# Search for all the runs in the experiment with the given experiment name
df = mlflow.search_runs(experiment_names=[experiment_name], order_by=["metrics.m DESC"])
print(df[["metrics.m", "tags.s.release", "run_id"]])
Output
   metrics.m tags.s.release                            run_id
0       2.50       1.2.0-GA  147eed886ab44633902cc8e19b2267e2
1       1.55       1.1.0-RC  5cc7feaf532f496f885ad7750809c4d4
--
   metrics.m tags.s.release                            run_id
0       1.55       1.1.0-RC  5cc7feaf532f496f885ad7750809c4d4
--
   metrics.m tags.s.release                            run_id
0       2.50       1.2.0-GA  147eed886ab44633902cc8e19b2267e2
1       1.55       1.1.0-RC  5cc7feaf532f496f885ad7750809c4d4
mlflow.set_experiment(experiment_name: Optional[str] = None, experiment_id: Optional[str] = None)Experiment[source]

Set the given experiment as the active experiment. The experiment must either be specified by name via experiment_name or by ID via experiment_id. The experiment name and ID cannot both be specified.

Parameters
  • experiment_name – Case sensitive name of the experiment to be activated. If an experiment with this name does not exist, a new experiment wth this name is created.

  • experiment_id – ID of the experiment to be activated. If an experiment with this ID does not exist, an exception is thrown.

Returns

An instance of mlflow.entities.Experiment representing the new active experiment.

Example
import mlflow

# Set an experiment name, which must be unique and case-sensitive.
experiment = mlflow.set_experiment("Social NLP Experiments")

# Get Experiment Details
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
Output
Experiment_id: 1
Artifact Location: file:///.../mlruns/1
Tags: {}
Lifecycle_stage: active
mlflow.set_experiment_tag(key: str, value: Any)None[source]

Set a tag on the current experiment. Value is converted to a string.

Parameters
  • key – Tag name (string). This string may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ), and slashes (/). All backend stores will support keys up to length 250, but some may support larger keys.

  • value – Tag value (string, but will be string-ified if not). All backend stores will support values up to length 5000, but some may support larger values.

Example
import mlflow

with mlflow.start_run():
   mlflow.set_experiment_tag("release.version", "2.2.0")
mlflow.set_experiment_tags(tags: Dict[str, Any])None[source]

Set tags for the current active experiment.

Parameters

tags – Dictionary containing tag names and corresponding values.

Example
import mlflow

tags = {"engineering": "ML Platform",
        "release.candidate": "RC1",
        "release.version": "2.2.0"}

# Set a batch of tags
with mlflow.start_run():
    mlflow.set_experiment_tags(tags)
mlflow.set_registry_uri(uri: str)None[source]

Set the registry server URI. This method is especially useful if you have a registry server that’s different from the tracking server.

Parameters

uri

  • An empty string, or a local file path, prefixed with file:/. Data is stored locally at the provided file (or ./mlruns if empty).

  • An HTTP URI like https://my-tracking-server:5000.

  • A Databricks workspace, provided as the string “databricks” or, to use a Databricks CLI profile, “databricks://<profileName>”.

Example
import mflow

# Set model registry uri, fetch the set uri, and compare
# it with the tracking uri. They should be different
mlflow.set_registry_uri("sqlite:////tmp/registry.db")
mr_uri = mlflow.get_registry_uri()
print("Current registry uri: {}".format(mr_uri))
tracking_uri = mlflow.get_tracking_uri()
print("Current tracking uri: {}".format(tracking_uri))

# They should be different
assert tracking_uri != mr_uri
Output
Current registry uri: sqlite:////tmp/registry.db
Current tracking uri: file:///.../mlruns
mlflow.set_tag(key: str, value: Any)None[source]

Set a tag under the current run. If no run is active, this method will create a new active run.

Parameters
  • key – Tag name (string). This string may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ), and slashes (/). All backend stores will support keys up to length 250, but some may support larger keys.

  • value – Tag value (string, but will be string-ified if not). All backend stores will support values up to length 5000, but some may support larger values.

Example
import mlflow

with mlflow.start_run():
   mlflow.set_tag("release.version", "2.2.0")
mlflow.set_tags(tags: Dict[str, Any])None[source]

Log a batch of tags for the current run. If no run is active, this method will create a new active run.

Parameters

tags – Dictionary of tag_name: String -> value: (String, but will be string-ified if not)

Returns

None

Example
import mlflow

tags = {"engineering": "ML Platform",
        "release.candidate": "RC1",
        "release.version": "2.2.0"}

# Set a batch of tags
with mlflow.start_run():
    mlflow.set_tags(tags)
mlflow.set_tracking_uri(uri: Union[str, pathlib.Path])None[source]

Set the tracking server URI. This does not affect the currently active run (if one exists), but takes effect for successive runs.

Parameters

uri

  • An empty string, or a local file path, prefixed with file:/. Data is stored locally at the provided file (or ./mlruns if empty).

  • An HTTP URI like https://my-tracking-server:5000.

  • A Databricks workspace, provided as the string “databricks” or, to use a Databricks CLI profile, “databricks://<profileName>”.

  • A pathlib.Path instance

Example
import mlflow

mlflow.set_tracking_uri("file:///tmp/my_tracking")
tracking_uri = mlflow.get_tracking_uri()
print("Current tracking uri: {}".format(tracking_uri))
Output
Current tracking uri: file:///tmp/my_tracking
mlflow.start_run(run_id: Optional[str] = None, experiment_id: Optional[str] = None, run_name: Optional[str] = None, nested: bool = False, tags: Optional[Dict[str, Any]] = None, description: Optional[str] = None)ActiveRun[source]

Start a new MLflow run, setting it as the active run under which metrics and parameters will be logged. The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current run.

If you pass a run_id or the MLFLOW_RUN_ID environment variable is set, start_run attempts to resume a run with the specified run ID and other parameters are ignored. run_id takes precedence over MLFLOW_RUN_ID.

If resuming an existing run, the run status is set to RunStatus.RUNNING.

MLflow sets a variety of default tags on the run, as defined in MLflow system tags.

Parameters
  • run_id – If specified, get the run with the specified UUID and log parameters and metrics under that run. The run’s end time is unset and its status is set to running, but the run’s other attributes (source_version, source_type, etc.) are not changed.

  • experiment_id – ID of the experiment under which to create the current run (applicable only when run_id is not specified). If experiment_id argument is unspecified, will look for valid experiment in the following order: activated using set_experiment, MLFLOW_EXPERIMENT_NAME environment variable, MLFLOW_EXPERIMENT_ID environment variable, or the default experiment as defined by the tracking server.

  • run_name – Name of new run (stored as a mlflow.runName tag). Used only when run_id is unspecified.

  • nested – Controls whether run is nested in parent run. True creates a nested run.

  • tags – An optional dictionary of string keys and values to set as tags on the run. If a run is being resumed, these tags are set on the resumed run. If a new run is being created, these tags are set on the new run.

  • description – An optional string that populates the description box of the run. If a run is being resumed, the description is set on the resumed run. If a new run is being created, the description is set on the new run.

Returns

mlflow.ActiveRun object that acts as a context manager wrapping the run’s state.

Example
import mlflow

# Create nested runs
experiment_id = mlflow.create_experiment("experiment1")
with mlflow.start_run(
    run_name="PARENT_RUN",
    experiment_id=experiment_id,
    tags={"version": "v1", "priority": "P1"},
    description="parent",
) as parent_run:
    mlflow.log_param("parent", "yes")
    with mlflow.start_run(
        run_name="CHILD_RUN",
        experiment_id=experiment_id,
        description="child",
        nested=True,
    ) as child_run:
        mlflow.log_param("child", "yes")

print("parent run:")

print("run_id: {}".format(parent_run.info.run_id))
print("description: {}".format(parent_run.data.tags.get("mlflow.note.content")))
print("version tag value: {}".format(parent_run.data.tags.get("version")))
print("priority tag value: {}".format(parent_run.data.tags.get("priority")))
print("--")

# Search all child runs with a parent id
query = "tags.mlflow.parentRunId = '{}'".format(parent_run.info.run_id)
results = mlflow.search_runs(experiment_ids=[experiment_id], filter_string=query)
print("child runs:")
print(results[["run_id", "params.child", "tags.mlflow.runName"]])
Output
parent run:
run_id: 8979459433a24a52ab3be87a229a9cdf
description: starting a parent for experiment 7
version tag value: v1
priority tag value: P1
--
child runs:
                             run_id params.child tags.mlflow.runName
0  7d175204675e40328e46d9a6a5a7ee6a          yes           CHILD_RUN