MLflow

MLflow 3.10.0 Highlights: Multi-Workspace Support, Multi-Turn Evaluation, and many UI Enhancements!

February 23, 2026 · 5 min read

MLflow 3.10.0 is a major release that enhances MLflow's AI Observability and evaluation capabilities, while also making these features easier to use, both for new users and organizations operating at scale. This release brings multi-workspace support, evaluation and simulation for chatbot conversations, cost tracking for your traces, usage tracking for your AI Gateway endpoints, and a number of UI enhancements to make your apps and agent development much more intuitive.

1. Workspace Support in MLflow Tracking Server

MLflow now supports multi-workspace environments. Users can organize experiments, models, prompts, with a coarser level of granularity and logically isolate them in a single tracking server. To enable this feature, pass the --enable-workspaces flag to the mlflow server command, or set the MLFLOW_ENABLE_WORKSPACES environment variable to true.

Learn more about multi-workspace support

2. Multi-turn Evaluation & Conversation Simulation

MLflow now supports multi-turn evaluation, including evaluating existing conversations with session-level scorers and simulating conversations to test new versions of your agent, without the toil of regenerating conversations. Use the session-level scorers introduced in MLflow 3.8.0 and the brand new session UIs to evaluate the quality of your conversational agents and enable automatic scoring to monitor quality as traces are ingested.

Learn more about multi-turn evaluation

3. Trace Cost Tracking

Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. Additionally, costs are aggregated and broken down in the "Overview" tab, giving you granular insights into your LLM spend patterns.

Learn more about trace cost tracking

As we continue to add more features to the MLflow UI, we found that navigation was getting cluttered and overwhelming, with poor separation of features for different workflow types. We've redesigned the navigation bar to be more intuitive and easier to use, with a new sidebar that provides a more relevant set of tabs for both GenAI apps and agent developers, as well as classic model training workflows. The new experience also gives more space to the main content area, making it easier to focus on the task at hand.

5. MLflow Demo Experiment

New to MLflow GenAI? With one click, launch a pre-populated demo and explore LLM tracing, evaluation, and prompt management in action. No configuration, no code required. This feature is available in the MLflow UI's homepage, and provides a comprehensive overview of the different functionality that MLflow has to offer.

Get started by clicking the button as shown in the video above, or by running mlflow demo in your terminal.

6. Gateway Usage Tracking

Monitor your AI Gateway endpoints with detailed usage analytics. A new "Usage" tab shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end AI observability.

To turn this feature on for your AI Gateway endpoints, make sure to check the "Enable usage tracking" toggle in your endpoint settings, as shown in the video above.

Learn more about Gateway usage tracking

7. In-UI Trace Evaluation

Run custom or pre-built LLM judges directly from the traces and sessions UI, no code required! This enables quick evaluation of individual traces and individual without context switching to the Python SDK. In order to use this feature, make sure to set up an AI gateway endpoint, as you'll need to select an endpoint to use when running LLM judges.

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.10.0 to try these new features:

pip install mlflow==3.10.0

We'd love to hear about your experience with these new features:

GitHub Issues - Report bugs or request features
MLflow Roadmap - See what's coming next and share your ideas
⭐ Star us on GitHub - Show your support for the project

Learn More

Join our upcoming webinar to see these features in action
Check out the MLflow documentation for detailed guides

MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization

January 30, 2026 · 6 min read

MLflow maintainers

MLflow 3.9.0 is a major release focused on AI Observability and Evaluation capabilities, bringing powerful new features for building, monitoring, and optimizing AI agents. This release introduces an AI-powered assistant, comprehensive dashboards for agent performance, a new judge optimization algorithm, judge builder UI, continuous monitoring with LLM judges, and distributed tracing.

1. MLflow Assistant Powered by Claude Code

MLflow Assistant transforms coding agents like Claude Code into experienced AI engineers by your side. Unlike typical chatbots, the assistant is aware of your codebase and context—it's not just a Q&A tool, but a full-fledged AI engineer that can find root causes for issues, set up quality tests, and apply LLMOps best practices to your project.

Key capabilities include:

No additional costs: Use your existing Claude Code subscription. MLflow provides the knowledge and integration at no cost.
Context-rich assistance: Understands your local codebase, project structure, and provides tailored recommendations—not generic advice.
Complete dev-loop: Goes beyond Q&A to fetch MLflow data, read your code, and add tracing, evaluation, and versioning to your project.
Fully customizable: Add custom skills, sub-agents, and permissions. Everything runs on your machine with full transparency.

Open the MLflow UI, navigate to the Assistant panel in any experiment page, and follow the setup wizard to get started.

Learn more about MLflow Assistant

2. Dashboards for Agent Performance Metrics

A new "Overview" tab in GenAI experiments provides pre-built charts and visualizations for monitoring agent performance at a glance. Monitor key metrics like latency, request counts, and quality scores without manual configuration. Identify performance trends and anomalies across your agent deployments, and get tool call summaries to understand how your agents are utilizing available tools.

Navigate to any GenAI experiment and click the "Overview" tab to access the dashboard. Charts are automatically populated based on your trace data. Have a specific visualization need? Request additional charts via GitHub Issues.

Learn more about GenAI Dashboards

3. MemAlign: A New Judge Optimizer Algorithm

MemAlign is a new optimization algorithm that learns evaluation guidelines from past feedback and dynamically retrieves relevant examples at runtime. Improve judge accuracy by learning from human feedback patterns, reduce prompt engineering effort with automatic guideline extraction, and adapt judge behavior dynamically based on the input being evaluated.

Use the MemAlignOptimizer to optimize your judges with historical feedback:

import mlflow
from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import MemAlignOptimizer

# Create a judge
judge = make_judge(
    name="politeness",
    instructions=(
        "Given a user question, evaluate if the chatbot's response is polite and respectful. "
        "Consider the tone, language, and context of the response.\n\n"
        "Question: {{ inputs }}\n"
        "Response: {{ outputs }}"
    ),
    feedback_value_type=bool,
    model="openai:/gpt-5-mini",
)

# Create the MemAlign optimizer
optimizer = MemAlignOptimizer(reflection_lm="openai:/gpt-5-mini")

# Retrieve traces with human feedback
traces = mlflow.search_traces(return_type="list")

# Align the judge
aligned_judge = judge.align(traces=traces, optimizer=optimizer)

Learn more about MemAlign

4. Configuring and Building a Judge with Judge Builder UI

A new visual interface lets you create and test custom LLM judge prompts without writing code. Iterate quickly on judge criteria and scoring rubrics with immediate feedback, test judges on sample traces before deploying to production, and export validated judges to the Python SDK for programmatic integration.

Navigate to the "Judges" section in the MLflow UI and click "Create Judge." Define your evaluation criteria, scoring rubric, and test your judge against sample traces. Once satisfied, export the configuration to use with the MLflow SDK.

Learn more about Judge Builder

5. Continuous Online Monitoring with MLflow LLM Judges

Automatically run LLM judges on incoming traces without writing any code, enabling continuous quality monitoring of your agents in production. Detect quality issues in real-time as traces flow through your system, leverage pre-defined judges for common evaluations like safety, relevance, groundedness, and correctness, and get actionable assessments attached directly to your traces.

Go to the "Judges" tab in your experiment, select from pre-defined judges or use your custom judges, and configure which traces to evaluate. Assessments are automatically attached to matching traces as they arrive.

Learn more about Agent Evaluation

6. Distributed Tracing for Tracking End-to-end Requests

Track requests across multiple services with context propagation, enabling end-to-end visibility into distributed AI systems. LLM tracing maintains trace continuity across microservices and external API calls, debug issues that span multiple services with a unified trace view, and understand latency and errors at each step of your distributed pipeline.

Use the get_tracing_context_headers_for_http_request and set_tracing_context_from_http_request_headers functions to inject and extract trace context:

# Service A: Inject context into the headers of the outgoing request
import requests
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root"):
    headers = get_tracing_context_headers_for_http_request()
    requests.post(
        "https://your.service/handle", headers=headers, json={"input": "hello"}
    )

# Service B: Extract context from incoming request
import mlflow
from flask import Flask, request
from mlflow.tracing import set_tracing_context_from_http_request_headers

app = Flask(__name__)

@app.post("/handle")
def handle():
    headers = dict(request.headers)
    with set_tracing_context_from_http_request_headers(headers):
        with mlflow.start_span("server-handler") as span:
            # ... your logic ...
            span.set_attribute("status", "ok")
    return {"ok": True}

Learn more about Distributed Tracing

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.9.0 to try these new features:

pip install mlflow==3.9.0

We'd love to hear about your experience with these new features:

GitHub Issues - Report bugs or request features
MLflow Roadmap - See what's coming next and share your ideas
⭐ Star us on GitHub - Show your support for the project

Learn More

Join our upcoming webinar to see these features in action
Check out the MLflow documentation for detailed guides

MLflow 3.8.1

December 26, 2025 · One min read

MLflow maintainers

MLflow 3.8.1 includes several bug fixes and documentation updates.

Bug fixes:

[Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
[Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
[Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
[UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)

Small bug fixes and documentation updates:

#19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.8.0

December 21, 2025 · 5 min read

MLflow maintainers

MLflow 3.8.0 includes several major features and improvements

Major Features

⚙️Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
⏳In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
⚖️DeepEval and RAGAS Judges Integration: New get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
🛡️Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
⚡ Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)

Important Notice

Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables: MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.

Features:

[Tracking] Add default passphrase support (#19360, @BenWilson2)
[Tracing] Pydantic AI Stream support (#19118, @joelrobin18)
[Docs] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
[Tracking] Add --max-results option to mlflow experiments search (#19359, @alkispoly-db)
[Tracking] Enhance encryption security (#19253, @BenWilson2)
[Tracking] Fix and simplify Gateway store interfaces (#19346, @BenWilson2)
[Evaluation] Add inference_params support for LLM Judges (#19152, @debu-sinha)
[Tracing] Support batch span export to UC Table (#19324, @B-Step62)
[Tracking] Add endpoint tags (#19308, @BenWilson2)
[Docs / Evaluation] Add MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS to limit concurrent scorer execution (#19248, @debu-sinha)
[Evaluation / Tracking] Enable search_datasets in Databricks managed MLflow (#19254, @alkispoly-db)
[Prompts] render text prompt previews in markdown (#19200, @ispoljari)
[UI] Add linked prompts filter for trace search tab (#19192, @TomeHirata)
[Evaluation] Automatically wrap async functions when passed to predict_fn (#19249, @smoorjani)
[Evaluation] [3/6][builtin judges] Conversational Role Adherence (#19247, @joelrobin18)
[Tracking] [Endpoints] [1/x] Add backend DB tables for Endpoints (#19002, @BenWilson2)
[Tracking] [Endpoints] [3/x] Entities base definitions (#19004, @BenWilson2)
[Tracking] [Endpoints] [4/x] Abstract store interface (#19005, @BenWilson2)
[Tracking] [Endpoints] [5/x] SQL Store backend for Endpoints (#19006, @BenWilson2)
[Tracking] [Endpoints] [6/x] Protos and entities interfaces (#19007, @BenWilson2)
[Tracking] [Endpoints] [7/x] Add rest store implementation (#19008, @BenWilson2)
[Tracking] [Endpoints] [8/x] Add credential cache (#19014, @BenWilson2)
[Tracking] [Endpoints] [9/x] Add provider, model, and configuration handling (#19009, @BenWilson2)
[Evaluation / UI] Add show/hide visibility control for Evaluation runs chart view (#18797) (#18852, @pradpalnis)
[Tracking] Add mlflow experiments get command (#19097, @alkispoly-db)
[Server-infra] [ Gateway 1/10 ] Simplify secrets and masked secrets with map types (#19440, @BenWilson2)

Bug fixes:

[Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
[Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
[Tracking] Fix SQLAlchemy alias conflict in _search_runs for dataset filters (#19498, @fredericosantos)
[Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
[] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
[UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
[Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
[Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
[Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
[UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
[Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
[Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
[Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
[Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
[Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
[Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
[Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
[Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
[UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
[Models] Add context loading for ChatModel (#19250, @BenWilson2)
[Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
[Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
[Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)

Documentation updates:

[Docs] Add specific references for correctness scorers (#19472, @BenWilson2)
[Docs] Add documentation for Fluency scorer (#19481, @alkispoly-db)
[Docs] Update eval quickstart to put all code into a script (#19444, @achen530)
[Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
[Evaluation] Fix non-reproducible code examples in deep-learning.mdx (#19376, @saumilyagupta)
[Docs / Evaluation] fix: Confusing documentation for mlflow.genai.evaluate() (#19380, @brandonhawi)
[Docs] Deprecate model logging of OpenAI flavor (#19325, @TomeHirata)
[Docs] Add rounded corners to video elements in documentation (#19231, @copilot-swe-agent)
[Docs] Sync Python/TypeScript tab selections in tracing quickstart docs (#19184, @copilot-swe-agent)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.7.0

December 5, 2025 · 9 min read

MLflow maintainers

MLflow 3.7.0 includes several major features and improvements for GenAI Observability, Evaluation, and Prompt Management.

Major Features

📝 Experiment Prompts UI: New prompts functionality in the experiment UI allows you to manage and search prompts directly within experiments, with support for filter strings and prompt version search in traces. (#19156, #18919, #18906, @TomeHirata)
💬 Multi-turn Evaluation Support: Enhanced mlflow.genai.evaluate now supports multi-turn conversations, enabling comprehensive assessment of conversational AI applications with DataFrame and list inputs. (#18971, @AveshCSingh)
⚖️ Trace Comparison: New side-by-side comparison view in the Traces UI allows you to analyze and debug LLM application behavior across different runs, making it easier to identify regressions and improvements. (#17138, @joelrobin18)
🌐 Gemini TypeScript SDK: Auto-tracing support for Google's Gemini in TypeScript, expanding MLflow's observability capabilities for JavaScript/TypeScript AI applications. (#18207, @joelrobin18)
🎯 Structured Outputs in Judges: The make_judge API now supports structured outputs, enabling more precise and programmatically consumable evaluation results. (#18529, @TomeHirata)
🔗 VoltAgent Tracing: Added auto-tracing support for VoltAgent, extending MLflow's observability to this AI agent framework. (#19041, @joelrobin18)

Breaking Changes

[Tracking] SQLite is now the default backend for the MLflow Tracking server. (#18497, @harupy)
[Models] Remove deprecated diviner flavor (#18808, @copilot-swe-agent)
[Models] Remove deprecated promptflow flavor (#18805, @copilot-swe-agent)

Features

[Tracking] Create parent directories for SQLite database files (#19205, @harupy)
[Prompts] Link Prompts and Experiments when prompts are loaded/registered (#18883, @TomeHirata)
[Tracking] Include environment variable fallback for SGC run resumption (#19143, @artjen)
[Tracking] Add support for SGC run resumption from Databricks Jobs (#19015, @artjen)
[Evaluation] Add --builtin/-b flag to mlflow scorers list command (#19095, @alkispoly-db)
[Tracing] Pydantic AI Chat UI support (#18777, @joelrobin18)
[Tracking] Add auth support for scorers (#18699, @BenWilson2)
[Evaluation] Remove experimental flags from scorers (#18122, @BenWilson2)
[Evaluation] Add description field to all built-in scorers (#18547, @alkispoly-db)

Bug Fixes

[Tracing] Handle traces with third-party generic root span (#19217, @B-Step62)
[Tracing] Fix OTLP endpoint path handling per OpenTelemetry spec (#19154, @harupy)
[Tracing] Add gzip/deflate Content-Encoding support to OTLP traces endpoint (#19024, @Miaoxiang-philips)
[Tracing] Add missing _delete_trace_tag_v3 API (#18813, @Tian-Sky-Lan)
[Tracing] Fix bug in chat sessions view where new sessions created after UI launch are not visible due to incorrect timestamp filtering (#18928, @dbczumar)
[Tracing] Fix OTLP proto conversion for empty list/dict (#18958, @B-Step62)
[Tracing] Agno V2 fixes (#18345, @joelrobin18)
[Tracing] Fix /v1/traces endpoint to return protobuf instead of JSON (#18929, @copilot-swe-agent)
[Tracing] Pin click!=8.3.0 in MCP extra to fix MCP server failure (#18748, @copilot-swe-agent)
[Tracing] Fix MCP server uv installation command for external users (#18745, @copilot-swe-agent)
[Evaluation] Fix trace-based scorer evaluation by using agentic judge adapter (#19123, @alkispoly-db)
[Evaluation] Fix managed scorer registration failure (#19146, @xsh310)
[Evaluation] Fix InstructionsJudge using scorer description as assessment value (#19121, @alkispoly-db)
[Evaluation] Add validation to correctness judge expectation fields (#19026, @smoorjani)
[Evaluation] Fix model URI underscore handling (#18849, @RohanRouth)
[Evaluation] Fix evaluate_traces MCP tool error: use result_df instead of tables (#18825, @alkispoly-db)
[Evaluation] Fix Bedrock Anthropic adapter by adding required anthropic_version field (#17744, @harupy)
[Evaluation] Fix migration for pre-existing auth tables (#18793, @BenWilson2)
[Tracking] Fix tracking URI propagation (#18023, @shaperilio)
[Tracking] Fix SqlLoggedModelMetric association with experiment_id (#18382, @mcompen)
[Tracking] Add Flask routes to auth validators (#18486, @BenWilson2)
[Tracking] Add missing proto handler for Experiment association handling for datasets (#18769, @BenWilson2)
[UI] Show full dataset record content and add search bar in evaluation datasets UI (#19000, @dbczumar)
[UI] Request TraceInfo and Trace Assessments from a relative API path (#19032, @kbolashev)
[UI] Define LoggedModelOutput.to_dictionary() so LoggedModelOutput and runs containing them can be JSON serialized (#19017, @nicklamiller)
[UI] Fix router issue in TracesUI page (#19044, @joelrobin18)
[Build] Fix mlflow gc to remove model artifacts (#17282, @joelrobin18)
[Build] Fix Click 8.3.0 Sentinel.UNSET handling in MCP server (#18858, @harupy)
[Build] Add bucket-ownership checks for Amazon S3 (#18542, @kingroryg)
[Docs] Fix Python indentation in custom trace quickstart example (#19185, @copilot-swe-agent)
[Docs] Fix property blocks rendering horizontally in API documentation (#19125, @copilot-swe-agent)
[Docs] Fix CLI link missing api_reference prefix in documentation sidebars (#18893, @copilot-swe-agent)
[Docs] Fix notebook download URLs to use versioned paths (#18806, @harupy)
[Docs] Fix documentation redirects for removed getting-started pages (#18789, @copilot-swe-agent)
[Models] Fix shared cluster Py4j statefulness issue (#19139, @BenWilson2)
[Models] Prevent symlink path traversal in local artifact store (#18964, @BenWilson2)

Documentation Updates

[Docs] Add LangGraph optimization guide (#19180, @TomeHirata)
[Docs] Add documentation for milestone 1 of multi-turn evaluation support (#19033, @smoorjani)
[Docs] Update transformers and sentence transformers docs (#18925, @BenWilson2)
[Docs] Clean up Classic Eval docs (#19013, @BenWilson2)
[Docs] Improve documentation for prompt_template (#19105, @ingo-stallknecht)
[Docs] Fix typos in ML documentation main page (#19048, @copilot-swe-agent)
[Docs] Convert documentation GIF animations to MP4 videos (#18946, @harupy)
[Docs] Improve readability by adjusting sidebar layout and style (#18937, @kevin-lyn)
[Docs] Clean up scikit-learn docs (#18794, @BenWilson2)
[Docs] Clean up XGBoost docs (#18790, @BenWilson2)
[Docs] Clean up TensorFlow docs (#18850, @BenWilson2)
[Docs] Use the correct OTLP HTTP exporter in OTel collector YAML (#18930, @Miaoxiang-philips)
[Docs] Clean up SpaCy and Keras docs (#18895, @BenWilson2)
[Docs] Fix contents in tracing doc pages (#18750, @B-Step62)
[Docs] Improve file store deprecation warning messages (#18900, @harupy)
[Docs] Clean up the MLflow 3 docs content (#18871, @BenWilson2)
[Docs] Add multi-turn judge creation with make_judge API and direct judge invocation (#18897, @xsh310)
[Docs] Clean up PyTorch docs (#18816, @BenWilson2)
[Docs] Clean up Prophet docs (#18814, @BenWilson2)
[Docs] Clean up SparkML docs (#18811, @BenWilson2)
[Docs] Clean up the traditional ML landing page (#18799, @BenWilson2)
[Docs] Clean up the Deep Learning landing page (#18820, @BenWilson2)
[Docs] Clean up evaluation datasets docs (#18766, @BenWilson2)
[Docs] Fix OpenTelemetry documentation (#18810, @joelrobin18)
[Docs] Clarify mlflow gc command behavior for pinned runs and registered models (#18704, @copilot-swe-agent)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.6.0

November 7, 2025 · 3 min read

MLflow maintainers

MLflow 3.6.0 includes several major features and improvements for AI Observability, Experiment UI, Agent Evaluation and Deployment.

#1: Full OpenTelemetry Support in MLflow Tracking Server

OpenTelemetry Trace Example

MLflow now offers comprehensive OpenTelemetry integration, allowing you to use OpenTelemetry and MLflow seamlessly together for your observability stack.

Ingest OpenTelemetry spans directly into the MLflow tracking server
Monitor existing applications that are instrumented with OpenTelemetry
Choose Arbitrary Languages for your AI applications and trace them, including Java, Go, Rust, and more.
Create unified traces that combine MLflow SDK instrumentation with OpenTelemetry auto-instrumentation from third-party libraries

For more information, please check out the blog post for more details.

#2: Session-level View in Trace UI

Session-level View in Trace UI

New chat sessions tab provides a dedicated view for organizing and analyzing related traces at the session level, making it easier to track conversational workflows.

See the Track Users & Sessions guide for more details.

#3: New Supported Frameworks in TypeScript Tracing SDK

Auto-tracing support for Vercel AI SDK, LangChain.js, Mastra, Anthropic SDK, Gemini SDK in TypeScript, expanding MLflow's observability capabilities across popular JavaScript/TypeScript frameworks.

For more information, please check out the TypeScript Tracing SDK.

#4: Tracking Judge Cost and Traces

Comprehensive tracking of LLM judge evaluation costs and traces, providing visibility into evaluation expenses and performance with automatic cost calculation and rendering

See LLM Evaluation Guide for more details.

#5: New experiment tab bar

The experiment tab bar has been fully overhauled to provide more intuitive and discoverable navigation of different features in MLflow.

Upgrade to MLflow 3.6.0 to try it out!

#6: Agent Server for Lightning Agent Deployment

import agent
from mlflow.genai.agent_server import AgentServer

agent_server = AgentServer("ResponsesAgent")
app = agent_server.app

def main():
    agent_server.run(app_import_string="start_server:app")

if __name__ == "__main__":
    main()

python3 start_server.py

curl -X POST http://localhost:8000/invocations \
   -H "Content-Type: application/json" \
   -d '{
    "input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
    "stream": true
    }'

New agent server infrastructure for managing and deploying scoring agents with enhanced orchestration capabilities.

See Agent Server Guide for more details.

Breaking Changes and deprecations

Drop numbering suffix (_1, _2, ...) from span names (#18531)
Deprecate promptflow, pmdarima, and diviner flavors (#18597, #18577)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.5.1

October 21, 2025 · 3 min read

MLflow maintainers

MLflow 3.5.1 is a patch release that includes several bug fixes and improvements.

Features:

[CLI] Add CLI command to list registered scorers by experiment (#18255, @alkispoly-db)
[Deployments] Add configuration option for long-running deployments client requests (#18363, @BenWilson2)
[Deployments] Create set_databricks_monitoring_sql_warehouse_id API (#18346, @dbrx-euirim)
[Prompts] Show instructions for prompt optimization on prompt registry (#18375, @TomeHirata)

Bug fixes:

[Evaluation] Validate if trace is None before accessing the value in mlflow.genai.evaluate (#18285, @srinathmkce)
[Evaluation] Revert "Add atomicity to job_start API (#18226)" (@serena-ruan)
[MCP] Move fastmcp to optional mcp extra (#18422, @harupy)
[Model Registry] Fix serialization bug in file store (#18365, @BenWilson2)
[Scoring] Pin uvloop to less than 0.22 to fix mlserver compatibility (#18370, @harupy)
[Tracing] Fix a forward-compatibility issue with Span to_dict (#18439, @serena-ruan)
[Tracing] Whitelist notebook trace UI renderer to allow display with default security settings (#18446, @TomeHirata)
[Tracing] Fix attribute error in StrandsAgent tracing (#18409, @B-Step62)
[Tracing] Adjust truncation logic in trace previews (#18412, @BenWilson2)
[Tracing] Revert "Fix response handling in log_spans (#18280)" (#18349, @serena-ruan)
[Tracking] Adjust util for remote tracking server declaration (#18411, @BenWilson2)
[Tracking] Handle Databricks FMAPI style in openai autolog (#18354, @TomeHirata)
[Tracking] Fetch config after adding first record (#18338, @serena-ruan)
[UI] Fix span ID parsing in the UI (#18419, @daniellok-db)
[UI] Fix Chat message parsing within the trace summary view modal (#18454, @daniellok-db)
[UI] Fix an issue with display of the assessments pane in the UI (#18333, @BenWilson2)

Documentation updates:

[Docs] Fix Kubernetes Deployment Tutorial Code (#18381, @Maeril)
[Docs] Update the documentation around requirements for optimize_prompts (#18398, @TomeHirata)
[Docs] Fix example FastAPI in track user sessions (#18388, @maxscheijen)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.5.0

October 16, 2025 · 8 min read

MLflow maintainers

MLflow 3.5.0 includes several major features and improvements!

Major Features

⚙️ Job Execution Backend: Introduced a new job execution backend infrastructure for running asynchronous tasks with individual execution pools, job search capabilities, and transient error handling. (#17676, #18012, #18070, #18071, #18112, #18049, @WeichenXu123)
🎯 Flexible Prompt Optimization API: Introduced a new flexible API for prompt optimization with support for model switching and the GEPA algorithm, enabling more efficient prompt tuning with fewer rollouts. See the documentation to get started. (#18183, #18031, @TomeHirata)
🎨 Enhanced UI Onboarding: Improved in-product onboarding experience with trace quickstart drawer and updated homepage guidance to help users discover MLflow's latest features. (#18098, #18187, @B-Step62)
🔐 Security Middleware for Tracking Server: Added a security middleware layer to protect against DNS rebinding, CORS attacks, and other security threats. Read the documentation for configuration details. (#17910, @BenWilson2)

Features

[Tracing / Tracking] Add unlink_traces_from_run batch operation (#18316, @harupy)
[Tracing] Add batch trace link/unlink operations to DatabricksTracingRestStore (#18295, @harupy)
[Tracking] Claude Code SDK autologging support (#18022, @smoorjani)
[Tracing] Add support for reading trace configuration from environment variables (#17792, @joelrobin18)
[Tracking] Mistral tracing improvements (#16370, @joelrobin18)
[Tracking] Gemini token count tracking (#16248, @joelrobin18)
[Tracking] Gemini streaming support (#16249, @joelrobin18)
[Tracking] CrewAI token count tracking with documentation updates (#16373, @joelrobin18)
[Evaluation] Allow passing empty scorer list for manual result comparison (#18265, @B-Step62)
[Evaluation] Log assessments to DSPy evaluation traces (#18136, @B-Step62)
[Evaluation] Add support for trace inputs to built-in scorers (#17943, @BenWilson2)
[Evaluation] Add synonym handling for built-in scorers (#17980, @BenWilson2)
[Evaluation] Add span timing tool for Agent Judges (#17948, @BenWilson2)
[Evaluation] Allow disabling evaluation sample check (#18032, @B-Step62)
[Evaluation] Reduce verbosity of SIMBA optimizer logs when aligning judges (#17795, @BenWilson2)
[Evaluation] Add __repr__ method for Judges (#17794, @BenWilson2)
[Prompts] Add prompt registry support to MLflow webhooks (#17640, @harupy)
[Prompts] Prompt Registry Chat UI (#17334, @joelrobin18)
[UI] Delete parent and child runs together (#18052, @joelrobin18)
[UI] Added move to top, move to bottom for charts (#17742, @joelrobin18)
[Tracking] Use sampling data for run comparison to improve performance (#17645, @lkuo)
[Tracking] Add optional 'outputs' column for evaluation dataset records (#17735, @WeichenXu123)

Bug Fixes

[Tracing] Fix parent run resolution mechanism for LangChain (#17273, @B-Step62)
[Tracing] Add client-side retry for get_trace to improve reliability (#18224, @B-Step62)
[Tracing] Fix OpenTelemetry dual export (#18163, @B-Step62)
[Tracing] Suppress false warnings from span logging (#18092, #18276, @B-Step62)
[Tracing] Fix OpenTelemetry resource attributes not propagating correctly (#18019, @xiaosha007)
[Tracing] Fix DSPy prompt display (#17988, @B-Step62)
[Tracing] Fix usage aggregation to avoid ancestor duplication (#17921, @TomeHirata)
[Tracing] Fix double counting in Strands tracing (#17855, @joelrobin18)
[Tracing] Fix to_predict_fn to handle traces without tags field (#17784, @harupy)
[Tracing] URL-encode trace tag keys in delete_trace_tag to prevent 404 errors (#18232, @copilot-swe-agent)
[Tracking] Fix Claude Code autologging inputs not displaying (#17858, @smoorjani)
[Tracking] Fix runs with 0-valued metrics not appearing in experiment list contour plots (#17916, @WeichenXu123)
[Tracking] Fix DSPy run display (#18137, @B-Step62)
[Tracking] Allow list of types in tools JSON Schema for OpenAI autolog (#17908, @fedem96)
[Tracking] Set tracking URI environment variable for job runner (#18073, @WeichenXu123)
[Evaluation] Add atomicity to job_start API (#18226, @BenWilson2)
[Evaluation] Fix trace ingest for outputs in merge_records() API (#18047, @BenWilson2)
[Evaluation] Fix judge regression (#18039, @B-Step62)
[Evaluation] Fix judges to use non-empty user messages for Anthropic model compatibility (#17935, @dbczumar)
[Evaluation] Fix endpoints error in judge (#18048, @joelrobin18)
[Model Registry] Fix creating model versions from non-Databricks tracking to Databricks Unity Catalog registry (#18244, @austinwarner-8451)
[Model Registry] Fix registry URI instantiation for artifact download (#17982, @arpitjasa-db)
[Model Registry] Include original error details in Unity Catalog model copy failure messages (#17997, @harupy)
[Model Registry] Fix webhook delivery to exit early for FileStore instances (#18015, @copilot-swe-agent)
[Prompts] Fix error suppression during prompt alias resolution when allow_missing is set (#17541, @mr-brobot)
[UI] General UI improvements (#18281, @joelrobin18)
[Models] Fix dataset issue (#18081, @joelrobin18)
[Models] Forward dataset name and digest to PolarsDataset's to_evaluation_dataset method (#17886, @sadelcarpio)
[Build] Fix mlflow server exiting immediately when optional huey package is missing (#18016, @harupy)
[Scoring] Fix chat completion arguments (#18248, @aravind-segu)

Documentation Updates

[Docs] Add self-hosted documentation support (#17986, @B-Step62)
[Docs] Add GitHub feature requests section to GenAI documentation (#18342, @TomeHirata)
[Docs] Update Claude Code SDK tracing documentation (#18026, @smoorjani)
[Docs] Add documentation for Analyze Experiment MCP/CLI command (#17978, @nsthorat)
[Docs] Add deprecation notice for custom prompt judge (#18287, @smoorjani)
[Docs] Overhaul scorer documentation (#17930, @B-Step62)
[Docs] Add default optimizer documentation (#17814, @BenWilson2)
[Docs] Update TypeScript SDK contribution documentation (#17995, @joelrobin18)
[Docs] Fix Postgres 18+ mount path in documentation (#18192, @soyun11)
[Docs] Fix typo: correct variable name from max_few_show_examples to max_few_shot_examples (#18246, @srinathmkce)
[Docs] Replace single quotes with double quotes for Windows compatibility (#18266, @PavithraNelluri)
[Docs] Fix typo in model registry documentation (#18038, @EddieMG)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.4.0

September 17, 2025 · 7 min read

MLflow maintainers

MLflow 3.4.0 includes several major features and improvements

Major New Features

📊 OpenTelemetry Metrics Export: MLflow now exports span-level statistics as OpenTelemetry metrics, providing enhanced observability and monitoring capabilities for traced applications. (#17325, @dbczumar)
🤖 MCP Server Integration: Introducing the Model Context Protocol (MCP) server for MLflow, enabling AI assistants and LLMs to interact with MLflow programmatically. (#17122, @harupy)
🧑‍⚖️ Custom Judges API: New make_judge API enables creation of custom evaluation judges for assessing LLM outputs with domain-specific criteria. (#17647, @BenWilson2, @dbczumar, @alkispoly-db, @smoorjani)
📈 Correlations Backend: Implemented backend infrastructure for storing and computing correlations between experiment metrics using NPMI (Normalized Pointwise Mutual Information). (#17309, #17368, @BenWilson2)
🗂️ Evaluation Datasets: MLflow now supports storing and versioning evaluation datasets directly within experiments for reproducible model assessment. (#17447, @BenWilson2)
🔗 Databricks Backend for MLflow Server: MLflow server can now use Databricks as a backend, enabling seamless integration with Databricks workspaces. (#17411, @nsthorat)
🤖 Claude Autologging: Automatic tracing support for Claude AI interactions, capturing conversations and model responses. (#17305, @smoorjani)
🌊 Strands Agent Tracing: Added comprehensive tracing support for Strands agents, including automatic instrumentation for agent workflows and interactions. (#17151, @joelrobin18)
🧪 Experiment Types in UI: MLflow now introduces experiment types, helping reduce clutter between classic ML/DL and GenAI features. MLflow auto-detects the type, but you can easily adjust it via a selector next to the experiment name. (#17605, @daniellok-db)

Features:

[Evaluation] Add ability to pass tags via dataframe in mlflow.genai.evaluate (#17549, @smoorjani)
[Evaluation] Add custom judge model support for Safety and RetrievalRelevance builtin scorers (#17526, @dbrx-euirim)
[Tracing] Add AI commands as MCP prompts for LLM interaction (#17608, @nsthorat)
[Tracing] Add MLFLOW_ENABLE_OTLP_EXPORTER environment variable (#17505, @dbczumar)
[Tracing] Support OTel and MLflow dual export (#17187, @dbczumar)
[Tracing] Make set_destination use ContextVar for thread safety (#17219, @B-Step62)
[CLI] Add MLflow commands CLI for exposing prompt commands to LLMs (#17530, @nsthorat)
[CLI] Add 'mlflow runs link-traces' command (#17444, @nsthorat)
[CLI] Add 'mlflow runs create' command for programmatic run creation (#17417, @nsthorat)
[CLI] Add MLflow traces CLI command with comprehensive search and management capabilities (#17302, @nsthorat)
[CLI] Add --env-file flag to all MLflow CLI commands (#17509, @nsthorat)
[Tracking] Backend for storing scorers in MLflow experiments (#17090, @WeichenXu123)
[Model Registry] Allow cross-workspace copying of model versions between WMR and UC (#17458, @arpitjasa-db)
[Models] Add automatic Git-based model versioning for GenAI applications (#17076, @harupy)
[Models] Improve WheeledModel._download_wheels safety (#17004, @serena-ruan)
[Projects] Support resume run for Optuna hyperparameter optimization (#17191, @lu-wang-dl)
[Scoring] Add MLFLOW_DEPLOYMENT_CLIENT_HTTP_REQUEST_TIMEOUT environment variable (#17252, @dbczumar)
[UI] Add ability to hide/unhide all finished runs in Chart view (#17143, @joelrobin18)
[Telemetry] Add MLflow OSS telemetry for invoke_custom_judge_model (#17585, @dbrx-euirim)

Bug fixes:

[Evaluation] Implement DSPy LM interface for default Databricks model serving (#17672, @smoorjani)
[Evaluation] Fix aggregations incorrectly applied to legacy scorer interface (#17596, @BenWilson2)
[Evaluation] Add Unity Catalog table source support for mlflow.evaluate (#17546, @BenWilson2)
[Evaluation] Fix custom prompt judge encoding issues with custom judge models (#17584, @dbrx-euirim)
[Tracking] Fix OpenAI autolog to properly reconstruct Response objects from streaming events (#17535, @WeichenXu123)
[Tracking] Add basic authentication support in TypeScript SDK (#17436, @kevin-lyn)
[Tracking] Update scorer endpoints to v3.0 API specification (#17409, @WeichenXu123)
[Tracking] Fix scorer status handling in MLflow tracking backend (#17379, @WeichenXu123)
[Tracking] Fix missing source-run information in UI (#16682, @WeichenXu123)
[Scoring] Fix spark_udf to always use stdin_serve for model serving (#17580, @WeichenXu123)
[Scoring] Fix a bug with Spark UDF usage of uv as an environment manager (#17489, @WeichenXu123)
[Model Registry] Extract source workspace ID from run_link during model version migration (#17600, @arpitjasa-db)
[Models] Improve security by reducing write permissions in temporary directory creation (#17544, @BenWilson2)
[Server-infra] Fix --env-file flag compatibility with --dev mode (#17615, @nsthorat)
[Server-infra] Fix basic authentication with Uvicorn server (#17523, @kevin-lyn)
[UI] Fix experiment comparison functionality in UI (#17550, @Flametaa)
[UI] Fix compareExperimentsSearch route definitions (#17459, @WeichenXu123)

Documentation updates:

[Docs] Add clarification for trace requirements in scorers documentation (#17542, @BenWilson2)
[Docs] Add documentation for Claude code autotracing (#17521, @smoorjani)
[Docs] Remove experimental status message for MPU/MPD features (#17486, @BenWilson2)
[Docs] Remove problematic pages from documentation (#17453, @BenWilson2)
[Docs] Add documentation for updating signatures on Databricks registered models (#17450, @arpitjasa-db)
[Docs] Update Scorers API documentation (#17298, @WeichenXu123)
[Docs] Add comprehensive documentation for scorers (#17258, @B-Step62)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.4.0rc0

September 12, 2025 · 6 min read

MLflow maintainers

MLflow 3.4.0rc0 is a release candidate for 3.4.0. To install, run the following command:

pip install mlflow==3.4.0rc0

MLflow 3.4.0rc0 includes several major features and improvements

Major New Features

📊 OpenTelemetry Metrics Export: MLflow now exports span-level statistics as OpenTelemetry metrics, providing enhanced observability and monitoring capabilities for traced applications. (#17325, @dbczumar)
🤖 MCP Server Integration: Introducing the Model Context Protocol (MCP) server for MLflow, enabling AI assistants and LLMs to interact with MLflow programmatically. (#17122, @harupy)
🧑‍⚖️ Custom Judges API: New make_judge API enables creation of custom evaluation judges for assessing LLM outputs with domain-specific criteria. (#17647, @BenWilson2, @dbczumar, @alkispoly-db, @smoorjani)
📈 Correlations Backend: Implemented backend infrastructure for storing and computing correlations between experiment metrics using NPMI (Normalized Pointwise Mutual Information). (#17309, #17368, @BenWilson2)
🗂️ Evaluation Datasets: MLflow now supports storing and versioning evaluation datasets directly within experiments for reproducible model assessment. (#17447, @BenWilson2)
🔗 Databricks Backend for MLflow Server: MLflow server can now use Databricks as a backend, enabling seamless integration with Databricks workspaces. (#17411, @nsthorat)
🤖 Claude Autologging: Automatic tracing support for Claude AI interactions, capturing conversations and model responses. (#17305, @smoorjani)
🌊 Strands Agent Tracing: Added comprehensive tracing support for Strands agents, including automatic instrumentation for agent workflows and interactions. (#17151, @joelrobin18)

Features:

[Evaluation] Add ability to pass tags via dataframe in mlflow.genai.evaluate (#17549, @smoorjani)
[Evaluation] Add custom judge model support for Safety and RetrievalRelevance builtin scorers (#17526, @dbrx-euirim)
[Tracing] Add AI commands as MCP prompts for LLM interaction (#17608, @nsthorat)
[Tracing] Add MLFLOW_ENABLE_OTLP_EXPORTER environment variable (#17505, @dbczumar)
[Tracing] Support OTel and MLflow dual export (#17187, @dbczumar)
[Tracing] Make set_destination use ContextVar for thread safety (#17219, @B-Step62)
[CLI] Add MLflow commands CLI for exposing prompt commands to LLMs (#17530, @nsthorat)
[CLI] Add 'mlflow runs link-traces' command (#17444, @nsthorat)
[CLI] Add 'mlflow runs create' command for programmatic run creation (#17417, @nsthorat)
[CLI] Add MLflow traces CLI command with comprehensive search and management capabilities (#17302, @nsthorat)
[CLI] Add --env-file flag to all MLflow CLI commands (#17509, @nsthorat)
[Tracking] Backend for storing scorers in MLflow experiments (#17090, @WeichenXu123)
[Model Registry] Allow cross-workspace copying of model versions between WMR and UC (#17458, @arpitjasa-db)
[Models] Add automatic Git-based model versioning for GenAI applications (#17076, @harupy)
[Models] Improve WheeledModel._download_wheels safety (#17004, @serena-ruan)
[Projects] Support resume run for Optuna hyperparameter optimization (#17191, @lu-wang-dl)
[Scoring] Add MLFLOW_DEPLOYMENT_CLIENT_HTTP_REQUEST_TIMEOUT environment variable (#17252, @dbczumar)
[UI] Add ability to hide/unhide all finished runs in Chart view (#17143, @joelrobin18)
[Telemetry] Add MLflow OSS telemetry for invoke_custom_judge_model (#17585, @dbrx-euirim)

Bug fixes:

[Evaluation] Implement DSPy LM interface for default Databricks model serving (#17672, @smoorjani)
[Evaluation] Fix aggregations incorrectly applied to legacy scorer interface (#17596, @BenWilson2)
[Evaluation] Add Unity Catalog table source support for mlflow.evaluate (#17546, @BenWilson2)
[Evaluation] Fix custom prompt judge encoding issues with custom judge models (#17584, @dbrx-euirim)
[Tracking] Fix OpenAI autolog to properly reconstruct Response objects from streaming events (#17535, @WeichenXu123)
[Tracking] Add basic authentication support in TypeScript SDK (#17436, @kevin-lyn)
[Tracking] Update scorer endpoints to v3.0 API specification (#17409, @WeichenXu123)
[Tracking] Fix scorer status handling in MLflow tracking backend (#17379, @WeichenXu123)
[Tracking] Fix missing source-run information in UI (#16682, @WeichenXu123)
[Scoring] Fix spark_udf to always use stdin_serve for model serving (#17580, @WeichenXu123)
[Scoring] Fix a bug with Spark UDF usage of uv as an environment manager (#17489, @WeichenXu123)
[Model Registry] Extract source workspace ID from run_link during model version migration (#17600, @arpitjasa-db)
[Models] Improve security by reducing write permissions in temporary directory creation (#17544, @BenWilson2)
[Server-infra] Fix --env-file flag compatibility with --dev mode (#17615, @nsthorat)
[Server-infra] Fix basic authentication with Uvicorn server (#17523, @kevin-lyn)
[UI] Fix experiment comparison functionality in UI (#17550, @Flametaa)
[UI] Fix compareExperimentsSearch route definitions (#17459, @WeichenXu123)

Documentation updates:

[Docs] Add clarification for trace requirements in scorers documentation (#17542, @BenWilson2)
[Docs] Add documentation for Claude code autotracing (#17521, @smoorjani)
[Docs] Remove experimental status message for MPU/MPD features (#17486, @BenWilson2)
[Docs] Remove problematic pages from documentation (#17453, @BenWilson2)
[Docs] Add documentation for updating signatures on Databricks registered models (#17450, @arpitjasa-db)
[Docs] Update Scorers API documentation (#17298, @WeichenXu123)
[Docs] Add comprehensive documentation for scorers (#17258, @B-Step62)

Please try it out and report any issues on the issue tracker.

LLMs & Agents

Model Training

LLMs & Agents

Model Training

1. Workspace Support in MLflow Tracking Server​

2. Multi-turn Evaluation & Conversation Simulation​

3. Trace Cost Tracking​

4. Navigation bar redesign​

5. MLflow Demo Experiment​

6. Gateway Usage Tracking​

7. In-UI Trace Evaluation​

Full Changelog​

What's Next​

Get Started​

Share Your Feedback​

Learn More​

1. MLflow Assistant Powered by Claude Code​

2. Dashboards for Agent Performance Metrics​

3. MemAlign: A New Judge Optimizer Algorithm​

4. Configuring and Building a Judge with Judge Builder UI​

5. Continuous Online Monitoring with MLflow LLM Judges​

6. Distributed Tracing for Tracking End-to-end Requests​

Full Changelog​

What's Next​

Get Started​

Share Your Feedback​

Learn More​

Bug fixes:​

Small bug fixes and documentation updates:​

Major Features​

Important Notice​

Features:​

Bug fixes:​

Documentation updates:​

Major Features​

Breaking Changes​

Features​

Bug Fixes​

Documentation Updates​

#1: Full OpenTelemetry Support in MLflow Tracking Server​

#2: Session-level View in Trace UI​

#3: New Supported Frameworks in TypeScript Tracing SDK​

#4: Tracking Judge Cost and Traces​

#5: New experiment tab bar​

#6: Agent Server for Lightning Agent Deployment​

Breaking Changes and deprecations​

Major Features​

Features​

Bug Fixes​

Documentation Updates​

Major New Features​

Major New Features​

1. Workspace Support in MLflow Tracking Server

2. Multi-turn Evaluation & Conversation Simulation

3. Trace Cost Tracking

4. Navigation bar redesign

5. MLflow Demo Experiment

6. Gateway Usage Tracking

7. In-UI Trace Evaluation

Full Changelog

What's Next

Get Started

Share Your Feedback

Learn More

1. MLflow Assistant Powered by Claude Code

2. Dashboards for Agent Performance Metrics

3. MemAlign: A New Judge Optimizer Algorithm

4. Configuring and Building a Judge with Judge Builder UI

5. Continuous Online Monitoring with MLflow LLM Judges

6. Distributed Tracing for Tracking End-to-end Requests

Full Changelog

What's Next

Get Started

Share Your Feedback

Learn More

Bug fixes:

Small bug fixes and documentation updates:

Major Features

Important Notice

Features:

Bug fixes:

Documentation updates:

Major Features

Breaking Changes

Features

Bug Fixes

Documentation Updates

#1: Full OpenTelemetry Support in MLflow Tracking Server

#2: Session-level View in Trace UI

#3: New Supported Frameworks in TypeScript Tracing SDK

#4: Tracking Judge Cost and Traces

#5: New experiment tab bar

#6: Agent Server for Lightning Agent Deployment

Breaking Changes and deprecations

Major Features

Features

Bug Fixes

Documentation Updates

Major New Features

Major New Features