mlflow.mleap

The mlflow.mleap module provides an API for saving Spark MLLib models using the MLeap persistence mechanism.

Note

You cannot load the MLeap model flavor in Python; you must download it using the Java API method downloadArtifacts(String runId) and load the model using the method MLeapLoader.loadPipeline(String modelRootPath).

exception mlflow.mleap.MLeapSerializationException(message, error_code=1, **kwargs)[source]

Bases: mlflow.exceptions.MlflowException

Exception thrown when a model or DataFrame cannot be serialized in MLeap format.

mlflow.mleap.add_to_model(mlflow_model, path, spark_model, sample_input)[source]

Note

This method requires all argument be specified by keyword.

Add the MLeap flavor to an existing MLflow model.

Parameters
  • mlflow_modelmlflow.models.Model to which this flavor is being added.

  • path – Path of the model to which this flavor is being added.

  • spark_model – Spark PipelineModel to be saved. This model must be MLeap-compatible and cannot contain any custom transformers.

  • sample_input – Sample PySpark DataFrame input that the model can evaluate. This is required by MLeap for data schema inference.

mlflow.mleap.log_model(spark_model, sample_input, artifact_path, registered_model_name=None, signature: mlflow.models.signature.ModelSignature = None, input_example: Union[pandas.core.frame.DataFrame, numpy.ndarray, dict, list] = None)[source]

Note

This method requires all argument be specified by keyword.

Log a Spark MLLib model in MLeap format as an MLflow artifact for the current run. The logged model will have the MLeap flavor.

Note

You cannot load the MLeap model flavor in Python; you must download it using the Java API method downloadArtifacts(String runId) and load the model using the method MLeapLoader.loadPipeline(String modelRootPath).

Parameters
  • spark_model – Spark PipelineModel to be saved. This model must be MLeap-compatible and cannot contain any custom transformers.

  • sample_input – Sample PySpark DataFrame input that the model can evaluate. This is required by MLeap for data schema inference.

  • artifact_path – Run-relative artifact path.

  • registered_model_name – (Experimental) If given, create a model version under registered_model_name, also creating a registered model if one with the given name does not exist.

  • signature

    (Experimental) ModelSignature describes model input and output Schema. The model signature can be inferred from datasets with valid model input (e.g. the training dataset with target column omitted) and valid model output (e.g. model predictions generated on the training dataset), for example:

    from mlflow.models.signature import infer_signature
    train = df.drop_column("target_label")
    predictions = ... # compute model predictions
    signature = infer_signature(train, predictions)
    

  • input_example – (Experimental) Input example provides one or several instances of valid model input. The example can be used as a hint of what data to feed the model. The given example will be converted to a Pandas DataFrame and then serialized to json using the Pandas split-oriented format. Bytes are base64-encoded.

Example
import mlflow
import mlflow.mleap
import pyspark
from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer
# training DataFrame
training = spark.createDataFrame([
    (0, "a b c d e spark", 1.0),
    (1, "b d", 0.0),
    (2, "spark f g h", 1.0),
    (3, "hadoop mapreduce", 0.0) ], ["id", "text", "label"])
# testing DataFrame
test_df = spark.createDataFrame([
    (4, "spark i j k"),
    (5, "l m n"),
    (6, "spark hadoop spark"),
    (7, "apache hadoop")], ["id", "text"])
# Create an MLlib pipeline
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
model = pipeline.fit(training)
# log parameters
mlflow.log_param("max_iter", 10)
mlflow.log_param("reg_param", 0.001)
# log the Spark MLlib model in MLeap format
mlflow.mleap.log_model(spark_model=model, sample_input=test_df, artifact_path="mleap-model")
mlflow.mleap.save_model(spark_model, sample_input, path, mlflow_model=None, signature: mlflow.models.signature.ModelSignature = None, input_example: Union[pandas.core.frame.DataFrame, numpy.ndarray, dict, list] = None)[source]

Note

This method requires all argument be specified by keyword.

Save a Spark MLlib PipelineModel in MLeap format at a local path. The saved model will have the MLeap flavor.

Note

You cannot load the MLeap model flavor in Python; you must download it using the Java API method downloadArtifacts(String runId) and load the model using the method MLeapLoader.loadPipeline(String modelRootPath).

Parameters
  • spark_model – Spark PipelineModel to be saved. This model must be MLeap-compatible and cannot contain any custom transformers.

  • sample_input – Sample PySpark DataFrame input that the model can evaluate. This is required by MLeap for data schema inference.

  • path – Local path where the model is to be saved.

  • mlflow_modelmlflow.models.Model to which this flavor is being added.

  • signature

    (Experimental) ModelSignature describes model input and output Schema. The model signature can be inferred from datasets with valid model input (e.g. the training dataset) and valid model output (e.g. model predictions generated on the training dataset), for example:

    from mlflow.models.signature import infer_signature
    train = df.drop_column("target_label")
    signature = infer_signature(train, model.predict(train))
    

  • input_example – (Experimental) Input example provides one or several instances of valid model input. The example can be used as a hint of what data to feed the model. The given example will be converted to a Pandas DataFrame and then serialized to json using the Pandas split-oriented format. Bytes are base64-encoded.

  • signature

    (Experimental) ModelSignature describes model input and output Schema. The model signature can be inferred from datasets with valid model input (e.g. the training dataset with target column omitted) and valid model output (e.g. model predictions generated on the training dataset), for example:

    from mlflow.models.signature import infer_signature
    train = df.drop_column("target_label")
    predictions = ... # compute model predictions
    signature = infer_signature(train, predictions)
    

  • input_example – (Experimental) Input example provides one or several instances of valid model input. The example can be used as a hint of what data to feed the model. The given example will be converted to a Pandas DataFrame and then serialized to json using the Pandas split-oriented format. Bytes are base64-encoded.