Skip to main content

Registering and Versioning Scorers

Scorers can be registered to MLflow experiments for version control and team collaboration.

Supported Scorers

Scorer TypeSupported
Agent-as-a-Judge
Template-based LLM Scorers
Code-based Scorers
Guidelines-based LLM Scorers❌ (Use MLflow Prompt Registry instead)
Predefined Scorers❌ (Prompts are hard-coded in MLflow)

Usage

Prerequisite

Judges are registered to an MLflow Experiment (not Run-level).

import mlflow

mlflow.set_tracking_uri("your-tracking-uri")
mlflow.create_experiment("evaluation-judges")

Define a sample template-based LLM scorer:

from mlflow.genai.judges import make_judge

quality_judge = make_judge(
name="response_quality",
instructions=("Evaluate if {{ outputs }} is high quality for {{ inputs }}."),
model="anthropic:/claude-opus-4-1-20250805",
)

Registering a Scorer

To register a judge to the experiment, call the register method on the judge instance.

# Register the judge
registered = quality_judge.register()
# You can pass experiment_id to register the judge to a specific experiment
# registered = quality_judge.register(experiment_id=experiment_id)

Updating a Scorer

Registering a new scorer with the same name will create a new version.

# Update and register a new version of the judge
quality_judge_v2 = make_judge(
name="response_quality", # Same name
instructions=(
"Evaluate if {{ outputs }} is high quality, accurate, and complete "
"for the question in {{ inputs }}."
),
model="anthropic:/claude-3.5-sonnet-20241022", # Updated model
)

# Register the updated judge
registered_v2 = quality_judge_v2.register(experiment_id=experiment_id)

Loading a Scorer

To load a registered scorer, use the get_scorer function.

from mlflow.genai.scorers import get_scorer

# Get the latest version
latest_judge = get_scorer(name="response_quality")
# or specify experiment_id to get a scorer from a specific experiment
# latest_judge = get_scorer(name="response_quality", experiment_id=experiment_id)

Listing Scorers

The list_scorers function returns a list of the scorers registered in the experiment.

from mlflow.genai.scorers import list_scorers

all_scorers = list_scorers(experiment_id=experiment_id)
for scorer in all_scorers:
print(f"Scorer: {scorer.name}, Model: {scorer.model}")

UI Support

Coming soon!