The Mlflow LLM Tracking component consists of two elements for logging and viewing the behavior of LLM’s. Firstly it is a set of APIs that allow for logging inputs, outputs, and prompts submitted and returned from LLM’s. Accompanying these APIs is a UI components that provides a simplified means of viewing the results of experimental submissions (prompts and inputs) and the results (LLM outputs).
Table of Contents
MLflow LLM Tracking is organized around the concept of runs, which are executions of some piece of data science code. Each run records the following information:
Key-value input parameters of your choice. Both keys and values are strings. These could be LLM parameters like top_k, temperature, etc.
Key-value metrics, where the value is numeric. Each metric can be updated throughout the course of the run (for example, to track how your model’s loss function is converging), and MLflow records and lets you visualize the metric’s full history.
For offline evaluation, you can log predictions returned from your model by passing in the prompts and inputs, as well as the returned outputs from the model. These predictions are logged in
csvformat as an MLflow artifact.
Output files in any format. For example, you can record images (e.g., PNGs), models (e.g., a pickled
openaimodel), and data files (e.g., a Parquet file) as artifacts.
You can optionally organize runs into experiments, which group and compare together runs for a
specific task. You can create an experiment using the
mlflow experiments CLI, with
mlflow.create_experiment(), or using the corresponding REST parameters. To interact
with experiments, the MLflow API and UI let you create and search for experiments.
Once your runs have been recorded, you can query them and compare predictions using the Tracking UI.
mlflow.log_metric() logs a single key-value metric. The value must always be a number.
MLflow remembers the history of values for each metric. Use
mlflow.log_metrics() to log
multiple metrics at once.
mlflow.llm.log_predictions() logs inputs, outputs and prompts. Inputs and prompts could either
be a list of strings or list of dict whereas the output would be a list of strings.
mlflow.log_artifact() logs a local file or directory as an artifact, optionally taking an
artifact_path to place it in within the run’s artifact URI. Run artifacts can be organized into
directories, enabling nested storage in multiple different paradigms for logging of inputs and predictions.
All the tracking information is recorded as part of a MLflow Experiment run.