Registering and Versioning Scorers
Scorers can be registered to MLflow experiments for version control and team collaboration.
Supported Scorers
Scorer Type | Supported |
---|---|
Agent-as-a-Judge | ✅ |
Template-based LLM Scorers | ✅ |
Code-based Scorers | ✅ |
Guidelines-based LLM Scorers | ❌ (Use MLflow Prompt Registry instead) |
Predefined Scorers | ❌ (Prompts are hard-coded in MLflow) |
Usage
Prerequisite
Judges are registered to an MLflow Experiment (not Run-level).
import mlflow
mlflow.set_tracking_uri("your-tracking-uri")
mlflow.create_experiment("evaluation-judges")
Define a sample template-based LLM scorer:
from mlflow.genai.judges import make_judge
quality_judge = make_judge(
name="response_quality",
instructions=("Evaluate if {{ outputs }} is high quality for {{ inputs }}."),
model="anthropic:/claude-opus-4-1-20250805",
)
Registering a Scorer
To register a judge to the experiment, call the register
method on the judge instance.
# Register the judge
registered = quality_judge.register()
# You can pass experiment_id to register the judge to a specific experiment
# registered = quality_judge.register(experiment_id=experiment_id)
Updating a Scorer
Registering a new scorer with the same name will create a new version.
# Update and register a new version of the judge
quality_judge_v2 = make_judge(
name="response_quality", # Same name
instructions=(
"Evaluate if {{ outputs }} is high quality, accurate, and complete "
"for the question in {{ inputs }}."
),
model="anthropic:/claude-3.5-sonnet-20241022", # Updated model
)
# Register the updated judge
registered_v2 = quality_judge_v2.register(experiment_id=experiment_id)
Loading a Scorer
To load a registered scorer, use the get_scorer
function.
from mlflow.genai.scorers import get_scorer
# Get the latest version
latest_judge = get_scorer(name="response_quality")
# or specify experiment_id to get a scorer from a specific experiment
# latest_judge = get_scorer(name="response_quality", experiment_id=experiment_id)
Listing Scorers
The list_scorers
function returns a list of the scorers registered in the experiment.
from mlflow.genai.scorers import list_scorers
all_scorers = list_scorers(experiment_id=experiment_id)
for scorer in all_scorers:
print(f"Scorer: {scorer.name}, Model: {scorer.model}")
UI Support
Coming soon!