from abc import abstractmethod
from mlflow.entities import RunStatus
_logger = logging.getLogger(__name__)
Wrapper around an MLflow project run (e.g. a subprocess running an entry point
command or a Databricks job run) and exposing methods for waiting on and cancelling the run.
This class defines the interface that the MLflow project runner uses to manage the lifecycle
of runs launched in different environments (e.g. runs launched locally or on Databricks).
``SubmittedRun`` is not thread-safe. That is, concurrent calls to wait() / cancel()
from multiple threads may inadvertently kill resources (e.g. local processes) unrelated to the
Subclasses of ``SubmittedRun`` must expose a ``run_id`` member containing the
run's MLflow run ID.
Instance of ``SubmittedRun`` corresponding to a subprocess launched to run an entry point
def __init__(self, run_id, command_proc):
self._run_id = run_id
self.command_proc = command_proc
return self.command_proc.wait() == 0
# Interrupt child process if it hasn't already exited
if self.command_proc.poll() is None:
# Kill the the process tree rooted at the child if it's the leader of its own process
# group, otherwise just kill the child
if self.command_proc.pid == os.getpgid(self.command_proc.pid):
# The child process may have exited before we attempted to terminate it, so we
# ignore OSErrors raised during child process termination
"Failed to terminate child process (PID %s) corresponding to MLflow "
"run with ID %s. The process may have already exited.",
exit_code = self.command_proc.poll()
if exit_code is None:
if exit_code == 0:
Wait for the run to finish, returning True if the run succeeded and false otherwise. Note
that in some cases (e.g. remote execution on Databricks), we may wait until the remote job
completes rather than until the MLflow run completes.
Get status of the run.
Cancel the run (interrupts the command subprocess, cancels the Databricks run, etc) and
waits for it to terminate. The MLflow run status may not be set correctly
upon run cancellation.