Skip to main content

MLflow 3.11.1

· 17 min read
MLflow maintainers
MLflow maintainers

MLflow 3.11.1 includes several major features and improvements.

Major New Features:

  • Automatic Issue Identification: Automatically identify quality issues in your agent with AI! Use the new "Detect Issues" button in the traces table to analyze selected traces and surface potential problems across categories like correctness, safety, and performance. Issues are linked directly to traces for easy investigation and debugging. Docs (#21431, #21204, #21165, #21163, #21161, @smoorjani, @serena-ruan)
  • Gateway Budget Alerts & Limits: Control your AI Gateway spending with configurable budget policies! Set spending limits by time window (daily, weekly, or monthly), receive alerts before hitting limits, and prevent runaway costs with automatic request blocking. The new budget management UI lets you track spending, configure webhooks for notifications, and monitor violations across all your gateway endpoints. Docs (#21116, #21534, #21569, #21473, #21108, @TomeHirata, @copilot-swe-agent)
  • Trace Graph View: Visualize complex trace hierarchies with an interactive graph view! Navigate multi-level trace structures, understand parent-child relationships at a glance, and debug complex systems more effectively with a visual representation of your trace topology. Docs (#20607, @joelrobin18)
  • Native OpenTelemetry GenAI Convention Support: MLflow now natively supports the OpenTelemetry GenAI Semantic Conventions for trace export! When exporting traces via OTLP with MLFLOW_ENABLE_OTEL_GENAI_SEMCONV enabled, MLflow automatically translates them to follow the OTel GenAI semantic conventions, enabling seamless integration with OTel-compatible observability platforms while preserving GenAI-specific metadata. Docs (#21494, #21495, @B-Step62)
  • OpenCode Tracing Integration: Debug smarter with OpenCode CLI integration! Track and analyze code execution flows directly from your development workflow, making it easier to identify performance bottlenecks and trace issues back to specific code paths. Docs (#20133, @joelrobin18)
  • Native UV Support for Model Dependencies: Automatic dependency inference now supports UV! MLflow automatically detects UV projects and captures exact, locked dependencies from your lockfile when logging models, ensuring reproducible environments. Docs (#20344, #20935, @debu-sinha)
  • Pickle-Free Model Serialization: Enhance security with pickle-free model formats! MLflow now supports safer model serialization using torch.export and skops formats, with improved controls when MLFLOW_ALLOW_PICKLE_DESERIALIZATION=False. Comprehensive documentation guides you through migrating existing models to pickle-free formats for production deployments. Docs (#21404, #21188, #20774, @WeichenXu123)

Breaking Changes:

  • TypeScript SDK Package Renaming: The MLflow TypeScript SDK packages have been renamed to use npm organization scoping. If you're using the TypeScript SDK, update your package.json dependencies and import statements: mlflow-tracing@mlflow/core, mlflow-openai@mlflow/openai, mlflow-anthropic@mlflow/anthropic, mlflow-gemini@mlflow/gemini. All packages are now at version 0.2.0. (#20792, @B-Step62)
  • Remove MLFLOW_ENABLE_INCREMENTAL_SPAN_EXPORT environment variable (#22182, @PattaraS)
  • Remove litellm and gepa from genai extras (#22059, @TomeHirata)
  • Block / and : in Registered Model names (#21458, @Bhuvan-08)

Features:

  • [Evaluation] Allow MetaPromptOptimizer to work without litellm (#22233, @TomeHirata)
  • [Tracking] Update Databricks API calls to use new gRPC APIs instead of py4j APIs (#22205, @WeichenXu123)
  • [Build] Add aiohttp as a core dependency of mlflow (#22189, @TomeHirata)
  • [Evaluation] Extend _get_provider_instance with groq, deepseek, xai, openrouter, ollama, databricks, vertex_ai (#22148, @kriscon-db)
  • [UI] Move native providers to non-LiteLLM in gateway UI (#22203, @TomeHirata)
  • [Tracing / Tracking] Add trace_location parameter to create_experiment (#22075, @dbrx-euirim)
  • [Gateway] Complete Bedrock provider with Converse API support (#21999, @TomeHirata)
  • [Gateway] Add native Vertex AI gateway provider (#21998, @TomeHirata)
  • [Gateway] Add native Databricks gateway provider (#21997, @TomeHirata)
  • [Gateway] Add native Ollama gateway provider (#21995, @TomeHirata)
  • [Gateway] Add native xAI (Grok) gateway provider (#21993, @TomeHirata)
  • [Tracing] Use bulk upsert in log_spans() to eliminate per-span ORM overhead (#21954, @harupy)
  • [Tracing] Add builtin cost_per_token to remove litellm dependency for cost tracking (#22046, @TomeHirata)
  • [Evaluation] Remove LiteLLM hard dependency from the discovery pipeline and judge adapters (#21739, @harupy)
  • [Evaluation] Add pipelined predict-score execution for mlflow.genai.evaluate (#20940, @alkispoly-db)
  • [Tracing / Tracking] Default trace location table_prefix to experiment ID in set_experiment (#21815, @danielseong1)
  • [Tracking] Add default uvicorn log config with timestamps (#21838, @harupy)
  • [Tracing / UI] Add Session ID filter to GenAI traces table filter dropdown (#21794, @daniellok-db)
  • [Evaluation / UI] Add Default Credential Chain auth mode for Bedrock/SageMaker in AI Gateway (#21061, @timsolovev)
  • [UI] Add multi metric bar chart support (#21258, @RenzoMXD)
  • [Tracking] Add TCP keepalive to HTTP sessions to detect stale connections and reduce timeout hangs (#21514, @mobaniha)
  • [Evaluation] Add proxy URL support for make_judge (#21185, @yukimori)
  • [UI] Improve run group filter to use grouping criteria instead of run IDs (#21072, @daniellok-db)
  • [UI] Add tool selector to Tool Calls charts and fix dark mode/sizing (#20865, @B-Step62)
  • [UI] Graph View Traces + OpenAI (#20607, @joelrobin18)
  • [UI] Show run description in chart tooltip (#21580, @KaushalVachhani)
  • [Evaluation / Tracing / UI] Add bulk judge execution from traces table toolbar with status feedback (#21270, @PattaraS)
  • [Gateway] Add Redis-backed BudgetTracker for distributed gateway deployments (#21504, @TomeHirata)
  • [Tracing / Tracking] Add trace location param to set_experiment (#21385, @danielseong1)
  • [Build / Tracking] Add azure extra for Azure Blob Storage support in full Docker image (#21582, @harupy)
  • [UI] Add budget violation indicator to gateway budget list page (#21569, @copilot-swe-agent)
  • [Evaluation] [5/5] Add discover_issues() pipeline and public API (#21431, @smoorjani)
  • [UI] Add Structured Output (JSON Schema) Support to the MLflow Prompts UI (#21394, @kennyvoo)
  • [Tracing] Auto-inject tracing context headers in autologging (#21490, @TomeHirata)
  • [UI] Add budget alert webhooks UI and fix budgets table borders (#21534, @TomeHirata)
  • [Model Registry / Prompts / UI] Add webhooks management UI to settings page (#21483, @TomeHirata)
  • [Tracing] Opencode CLI (#20133, @joelrobin18)
  • [Models] Add uv_groups and uv_extras params for uv dependency group control (#20935, @debu-sinha)
  • [Tracing] Add GenAI Semantic Convention translator for OTLP trace export (#21494, @B-Step62)
  • [Tracking] Add polars dataset support to autologging (#21507, @harupy)
  • [Tracing] Add mlflow.tracing.context() API for injecting metadata/tags without wrapper spans (#21318, @B-Step62)
  • [UI] Add budget dates and current spending for gateway budgets (#21473, @TomeHirata)
  • [Tracing / UI] Improve DSPy trace chat view readability (#21296, @B-Step62)
  • [UI] Add Kubernetes request auth provider plugin (#21176, @HumairAK)
  • [Tracking] Add IS NULL/IS NOT NULL support for tags and params in search_runs (#21283, @TomeHirata)
  • [Tracing / UI] Display clickable gateway trace link in trace explorer (#21316, @TomeHirata)
  • [UI] Add session selection support with checkbox, actions, and row alignment (#21324, @B-Step62)
  • [Models] Add UV package manager support for automatic dependency inference (#20344, @debu-sinha)
  • [Evaluation / UI] Add feature flag to control evaluation runs issues panel visibility (#21406, @serena-ruan)
  • [Tracing / UI] Add cached tokens display to Token Usage chart (#21295, @TomeHirata)
  • [UI] Add budget policies management UI for AI Gateway (#21116, @TomeHirata)
  • [UI] Allow multiple judge selection in Run judge on trace modal (#21322, @B-Step62)
  • [Docs / Tracking] Add admin-only authorization to webhook CRUD operations (#21271, @TomeHirata)
  • [Evaluation / Tracking] Add SqlIssue database table for storing experiment issues (#21165, @serena-ruan)
  • [Model Registry / Prompts] Support search_prompt_versions in OSS SQLAlchemy store (#21315, @TomeHirata)
  • [Evaluation / Tracing / UI] Add issue detection button to traces table toolbar with feature flag (#21204, @serena-ruan)
  • [Docs / Tracing / UI] Add inline audio player for input_audio content parts in trace UI (#21302, @TomeHirata)
  • [Evaluation / Tracing] Add IssueReference assessment type to store issue links with traces (#21163, @serena-ruan)
  • [Evaluation / Tracing] Add issue management protos with create, update, get, and search APIs (#21161, @serena-ruan)
  • [UI] Add IS NULL/IS NOT NULL operators for trace tags in search UI (#21280, @TomeHirata)
  • [Docs / Tracing] Add IS NULL/IS NOT NULL support for trace tags in search_traces (#21277, @TomeHirata)
  • [Tracing] Add steer message tracing support for Claude Code (#21265, @harupy)
  • [Models / Tracking] Add support for transformers 5.x (#20728, @KUrushi)
  • [Gateway] Add WEEKS to BudgetDurationUnit enum (#21196, @copilot-swe-agent)
  • [UI] Add try-it page on Gateway usage example modal (#21077, @PattaraS)
  • [Docs / Tracing / Tracking] Add mlflow.otel.autolog() for OTEL-based tracing integrations (Langfuse, Arize/Phoenix) (#20954, @alkispoly-db)
  • [Gateway] Add SQL schema and SQLAlchemy CRUD for gateway budget policies (#21108, @TomeHirata)
  • [UI] Add global gateway logs tab to usage page (#21126, @TomeHirata)
  • [Tracking] [MLflow Demo] Add server availability handling checks (#20349, @BenWilson2)
  • [Tracking] [MLflow Demo] Add scorers demo (#20287, @BenWilson2)
  • [Docs / Tracking] Add Backblaze B2 artifact repository (b2://) (#20731, @jeronimodeleon)
  • [Docs / Tracking] Add support for multipart download with presigned URLs for S3 compatible object storages (#20352, @etirelli)
  • [Tracing] MCP server expansion (#19830, @joelrobin18)
  • [Tracing / UI] Include response body in HTTP error messages with 1000 character limit (#20794, @copilot-swe-agent)

Bug fixes:

  • [Gateway] Fix DatabricksProvider to use OpenAI-compatible endpoint URLs (#22393, @TomeHirata)
  • [Evaluation] Fix: use EvalResult.scorer_stats for multi-turn scorer stat aggregation (#22364, @copilot-swe-agent)
  • [Scoring / Tracing] Revert "Register InferenceTableSpanProcessor alongside DatabricksUCTableSpanProcessor in model serving (#22332)" (#22362, @smurching)
  • [Scoring / Tracing] Warn when UCSchemaLocation destination is set in Databricks model serving (trace: null) (#22332, @smurching)
  • [Tracing / UI] Support tool_reference content blocks in Anthropic Chat UI parser (#22331, @B-Step62)
  • [Tracing] Fix online scoring failure when trace spans are stored in artifact repo (#20784, @Mr-Neutr0n)
  • [UI] Fix adding a tag with empty value silently failing without user feedback in the Experiments table (#22320, @WeichenXu123)
  • [Docs / Models] Bump minimum uv version requirement from 0.5.0 to 0.6.10 (#22313, @copilot-swe-agent)
  • [Scoring] Fix: exclude Serverless from use_dbconnect_artifact path in spark_udf (#22300, @franciffu723)
  • [UI] Fix assistant crash on unknown CLI message types (#21928, @SuperSonnix71)
  • [Tracing / Tracking] Fix mlflow-skinny: guard numpy-dependent imports in mlflow.types (#22211, @Suraj-kumar00)
  • [Tracing / UI] Fix dropdown showing wrong selection state before endpoints load in issue detection modal (#22236, @serena-ruan)
  • [Tracing] Normalize get_provider_name() to align with model_prices_and_context_window.json (#22223, @TomeHirata)
  • [Tracking / UI] Fix log_image with slash-containing keys: replace # with ~ as path separator (#22172, @copilot-swe-agent)
  • [Evaluation] Fix discovery pipeline _call_llm_via_gateway to handle gateway:/ URIs (#22153, @TomeHirata)
  • [UI] Auto-dismiss and fade-out judge run notifications in trace UI (#22137, @copilot-swe-agent)
  • [Evaluation / Tracking] Add polars version guard in polars_dataset.py to fix import failure with polars<1 (#22085, @TomeHirata)
  • [Tracking] Fix huey_consumer.py path resolution when venv bin dir is not on PATH (#22126, @copilot-swe-agent)
  • [UI] Fix sidebar navigation highlighting for run detail pages (#20860, @daniellok-db)
  • [Tracing] Lowercase model_provider in calculate_cost_by_model_and_token_usage (#22134, @TomeHirata)
  • [Gateway] Fix misleading "Discarded unknown message" log in Anthropic gateway provider (#21942, @copilot-swe-agent)
  • [UI] Fix selected run URL param not updating in eval runs table (#22135, @daniellok-db)
  • [Tracing / Tracking] Fix trace export DB contention by disabling incremental span export for gateway (#21721, @PattaraS)
  • [Tracing / Tracking] Expand session-level assessment filters to return all session traces (#21792, @daniellok-db)
  • [Evaluation] Support inference_params for built-in scorers (#21943, @debu-sinha)
  • [UI] Fix assistant stream killed by unhandled rate_limit_event from Claude Code CLI (#22067, @forrestmurray-db)
  • [UI] Fix gateway UI not showing custom model name during endpoint edit (#22068, @TomeHirata)
  • [Evaluation] Fix Anthropic structured outputs compatibility in gateway adapter (#21922, @harupy)
  • [UI] Remove assessment type dropdown and align terminology (#21379, @B-Step62)
  • [Tracking] Fix NextMethod() S3 dispatch error in R mlflow_get_run_context (#21957, @daniellok-db)
  • [Models / Tracking] Enforce auth on logged model artifact download AJAX endpoint (#21708, @B-Step62)
  • [Scoring] Fix tar path traversal vulnerability in extract_archive_to_dir (#21824, @TomeHirata)
  • [Scoring] Fix Starlette 1.0 compatibility in mlflow/pyfunc/scoring_server/__init__.py (#21908, @copilot-swe-agent)
  • [Tracing] [TS SDK] Port smart preview truncation from Python SDK (#21826, @B-Step62)
  • [UI] Fix trace drawer width using context instead of prop drilling (#21830, @B-Step62)
  • [UI] fix: use both registrations and tags for consistent registered model display (#20671) (#21555, @s-zx)
  • [Tracking] Fix autologging overwriting user's warnings.showwarning handler (#21707, @mango766)
  • [Tracing] Remove trace limit in issue discovery to annotate all affected traces (#21736, @serena-ruan)
  • [Scoring] fix: accept Sequence instead of list in to_chat_completions_input (#21724, @mr-brobot)
  • [Tracking] Set UV_PROJECT_ENVIRONMENT in run_uv_sync to install into the correct Python environment (#21750, @copilot-swe-agent)
  • [Tracing / UI] Fix chat UI rendering for OTel GenAI traces with non-standard attributes (#21215, @B-Step62)
  • [Build] Fix build-system in examples/uv-dependency-management/pyproject.toml (#21752, @copilot-swe-agent)
  • [Tracing] fix: avoid deepcopy in dataclass JSON serialization in TraceJSONEncoder (#21668, @raulblazquezbullon)
  • [Tracing] Support artifact-repo traces in batch_get_traces (#21650, @harupy)
  • [Evaluation / Tracing] Fall back to agentic judge mode when trace inputs/outputs are missing (#21306, @TomeHirata)
  • [UI] Fix chat/session view for LangGraph: deduplicate accumulated messages (#21279, @B-Step62)
  • [UI] Show server error detail in Try It panel for budget limit errors (#21568, @TomeHirata)
  • [Evaluation] Fix conversation simulator default model encoding on Databricks (#21644, @smoorjani)
  • [UI] Delete model definitions when endpoint is deleted from UI (#21649, @TomeHirata)
  • [UI] Hide _issue_discovery_judge feedback from traces UI (#21648, @harupy)
  • [Prompts] Clarify OSS register_prompt tag behavior (#21600, @yangbaechu)
  • [Prompts / Tracing / UI] Make Prompt column clickable in trace view (#21304, @copilot-swe-agent)
  • [UI] Fix dataset link not clickable for external source type (#21342, @smoorjani)
  • [Tracing / Tracking] Add audio content normalization for LangChain messages (#21533, @elliotllliu)
  • [UI] Add tooltips to display full budget and spend amounts in gateway budgets table (#21573, @copilot-swe-agent)
  • [Tracking / UI] downsample rows in SQL, update db index (#20928, @sscheele)
  • [Models] Skip _maybe_save_model for Databricks ACL-protected artifact URIs (#21602, @mohammadsubhani)
  • [UI] Make Try-It UI footer always visible in gateway endpoint modal (#21583, @copilot-swe-agent)
  • [Tracing / Tracking] Fix trace assessment filtering and MSSQL pagination syntax errors (#21273, @copilot-swe-agent)
  • [Tracing] Fix trace sampling to ensure parent-child consistency (#21524, @harupy)
  • [Tracking] Add Azure Government Cloud (usgovcloudapi.net) support to WASBS URI parsing (#21519, @ahringer)
  • [Gateway] Change default MLFLOW_GATEWAY_BUDGET_REFRESH_INTERVAL from 60 to 600 seconds (#21565, @copilot-swe-agent)
  • [Evaluation / Tracking] Fix scorer re-registration raising RESOURCE_ALREADY_EXISTS in auth layer (#21560, @harupy)
  • [Tracking] Harden check when MLFLOW_ALLOW_PICKLE_DESERIALIZATION is disabled (#21404, @WeichenXu123)
  • [Tracing] Fix trace ID collisions when random seed is set to fixed value (#21418, @WeichenXu123)
  • [UI] Remove "Rate Limiting [Coming Soon]" placeholder from gateway UI (#21559, @copilot-swe-agent)
  • [Gateway] Remove policy ID from budget limit exceeded error, show budget reset time instead (#21557, @copilot-swe-agent)
  • [Evaluation / Tracking] Fix Strands autolog tool input format for SpanType.TOOL (#21552, @LeviLong01)
  • [Tracing] Fix AttributeError in OpenAI autolog by excluding run_config from span attributes (#21454, @MarkVasile)
  • [Gateway] Fix singular/plural unit in budget limit exceeded error message (#21538, @copilot-swe-agent)
  • [UI] Invalidate budget windows cache on budget policy create/edit/delete (#21535, @copilot-swe-agent)
  • [Evaluation] Fix field-based make_judge prompt missing feedback_value_type (#21058, @yangbaechu)
  • [Tracing] Set MODEL_PROVIDER across autologging integrations for cost breakdown (#21288, @B-Step62)
  • [Evaluation] Fix gateway provider support in third-party judge integrations (ragas, deepeval, phoenix, trulens) (#21414, @copilot-swe-agent)
  • [Gateway] Update Anthropic gateway to use GA structured outputs API (#21436, @TomeHirata)
  • [Tracking] Adds builtin skops trusted types for LightGBM models (#21412, @WeichenXu123)
  • [Tracing / UI] Fix UI flickering in trace review modal during background refetches (#21290, @daniellok-db)
  • [Tracking] Add wildcard subdomain support to CORS origins validation (#21468, @arnewouters)
  • [UI] Fix refresh button on evaluation runs page to also refresh traces and assessments (#21332, @B-Step62)
  • [Models] Fix skops serialization format detection in _load_pyfunc (#21480, @copilot-swe-agent)
  • [UI] Fix Shift+Enter not creating newlines in assistant chat input (#21341, @smoorjani)
  • [UI] Make retrieved document source URLs clickable in span details view (#21340, @smoorjani)
  • [Evaluation / Tracing] Fix AttributeError when trace is None in genai evaluation (#19616, @omarfarhoud)
  • [Tracking] Fix CrewAI autologging compatibility with crewai >= 1.10 (#21376, @WeichenXu123)
  • [Tracing] Remove span name deduplication suffix from TypeScript SDK (#21382, @B-Step62)
  • [Evaluation] Fix LLM judge authentication failure when basic-auth is enabled (#21323, @PattaraS)
  • [UI] Fix stored XSS via unsafe YAML parsing of MLmodel artifacts (#21435, @harupy)
  • [Tracing / UI] Fix Pydantic AI Chat UI rendering for InstrumentedModel LLM spans (#21410, @B-Step62)
  • [Models] Fix transformers 5.3.0 compatibility for removed pipeline classes (#21426, @harupy)
  • [Tracing / UI] Fix Chat UI not rendering for Google ADK traces (#21274, @B-Step62)
  • [Tracking] Fix image artifact filename mangling caused by URL encoding of % separator (#21269, @harupy)
  • [Tracking] Fix: MLFLOW_ALLOW_PICKLE_DESERIALIZATION=False safety control is ineffective for pyfunc flavor (#21188, @WeichenXu123)
  • [Tracing / UI] Fix Pydantic AI autologging: auto-enable instrumentation and fix Chat UI (#21278, @B-Step62)
  • [Tracing] Fix span type not translated for OTel spans when MLflow SDK is active (#21307, @B-Step62)
  • [UI] Remove redundant "Hide assessments" toggle button (#21378, @B-Step62)
  • [Tracking] Fix Mistral autologging compatibility with mistralai >= 2.0 (#21374, @WeichenXu123)
  • [Tracking] Fix pydantic-ai autologging compatibility with pydantic-ai >= 1.63.0 (#21373, @WeichenXu123)
  • [Tracing / Tracking] Fix Claude Code autologging import collision with local mlflow folders (#21343, @smoorjani)
  • [Prompts] Fix stale prompt cache after prompt deletion (#21381, @yangbaechu)
  • [Tracing / Tracking] Fix flush_trace_async_logging AttributeError with non-default tracer provider (#21105, @cgrierson-smartsheet)
  • [UI] Fix session assessments panel terminology (#21336, @smoorjani)
  • [UI] Improve quality chart readability and styling in overview tab (#21325, @B-Step62)
  • [Tracing] Support uv run in Claude Code tracing hooks (#21327, @copilot-swe-agent)
  • [Tracing / UI] Fix Chat tab not rendering for non-OpenAI model names in OpenAI autolog spans (#21356, @TomeHirata)
  • [UI] Fix false 'endpoint deleted' warning after endpoint rename (#21333, @TomeHirata)
  • [UI] Fix broken image rendering in trace chat collapsed preview (#21291, @harupy)
  • [UI] Fix tag key validation UI contradiction (#21140, @KaushalVachhani)
  • [Tracing] Use correct env key for Claude Code settings environment variables (#21344, @smoorjani)
  • [UI] Fix truncated model names in Cost Breakdown donut chart (#21310, @TomeHirata)
  • [Evaluation / Tracing] Fix ConversationSimulator validation for predict_fn signatures and context fields (#21171, @yangbaechu)
  • [UI] [ML-63097] Fix broken LLM judge documentation links (#21347, @smoorjani)
  • [Tracing / Tracking] Add authentication support to OTLP exporter headers (#21230, @giulio-leone)
  • [Evaluation / Tracking] Fix deletion of assessments associated with a run (#20624, @retrowhiz)
  • [Models] Fix _deduplicate_requirements merging marker-differentiated requirements (#21098, @harupy)
  • [UI] Fix Tags functionality in Recent Experiments table on Home page (#20907, @joelrobin18)
  • [Tracing] Fix MCP fn_wrapper handling of Click UNSET defaults (#20953) (#20962, @yangbaechu)
  • [Evaluation] Enable Databricks LLM fallback for available tools extraction (#21017, @xsh310)
  • [UI] Fix sorting for timestamp columns in ExperimentListTable (#20908, @joelrobin18)
  • [UI] Fix tag value input being cleared when entered before key (#20910, @joelrobin18)
  • [Docs] Fix LiteLLM model URI format in eval quickstart docs (#20941, @copilot-swe-agent)
  • [Tracing] Fix SpanEvent timestamp resolution to use nanoseconds (#20828, @copilot-swe-agent)
  • [Tracking] Escape regex special chars in search_experiments LIKE filter (#16667, @joelrobin18)

Documentation updates:

  • [Docs] docs: clarify uv dependency management vs MLFLOW_LOCK_MODEL_DEPENDENCIES, add uv workspace limitation (#22312, @copilot-swe-agent)
  • [Docs] Document supported provider environment variables for judge models (#22195, @kriscon-db)
  • [Docs] Add relative duration examples for uv --exclude-newer (#22133, @copilot-swe-agent)
  • [Docs] Add secure installs documentation page (#22036, @harupy)
  • [Evaluation] Add documentation for issue detection (#22057, @serena-ruan)
  • [Tracing] Add OpenHands integration doc (#21933, @B-Step62)
  • [Docs / Tracing] Fix MLFLOW_ENABLE_ASYNC_TRACE_LOGGING docs to reflect OSS default behavior (#21731, @copilot-swe-agent)
  • [Docs] Add note for pickle-free model doc (#21732, @WeichenXu123)
  • [Docs] Add experiment note to the pickle-free model format doc page (#21709, @WeichenXu123)
  • [Docs] Add Guide: Deploy MLflow to Google Cloud (#21599, @WeichenXu123)
  • [Docs] Add Guide: Deploy MLflow to Azure cloud (#21128, @WeichenXu123)
  • [Docs / Tracing] Add Goose tracing integration documentation (#21190, @B-Step62)
  • [Docs] Expand Koog integration doc (#21218, @B-Step62)
  • [Docs / Tracing] Add 'Combine with MLflow SDK' section to OTel integration guides (#21298, @TomeHirata)
  • [Docs] docs: add Budget Tracker Strategies guideline to AI Gateway budget page (#21633, @copilot-swe-agent)
  • [Docs] Add tracking URI note to mlflow-skinny README (#21638, @harupy)
  • [Docs] Add Guide: Deploy MLflow to AWS cloud (#20729, @WeichenXu123)
  • [Docs / Models] Deprecate generate_signature_output in favor of input_example (#21556, @shivamshinde123)
  • [Docs] Claude MCP setup instructions to use .mcp.json or CLI (#21609, @copilot-swe-agent)
  • [Docs] [1/3] Document OTel attribute mapping (#21478, @B-Step62)
  • [Docs] docs: Add OpenAI Responses API examples to gateway passthrough documentation (#21545, @copilot-swe-agent)
  • [Docs] Add standalone multimodal content in traces documentation (#21357, @kriscon-db)
  • [Docs] Add documentation page for Budget Alerts & Limits (#21121, @TomeHirata)
  • [Docs / Models] Add documentation for pickle-free model formats (#20774, @WeichenXu123)
  • [Docs / Prompts] Update prompt registry docs to use MLflow 3.x API examples (#21267, @copilot-swe-agent)
  • [Docs] docs: Add single quotes to install commands with extras to prevent zsh errors (#21227, @mshavliuk)
  • [Docs] Add Amazon Nova bedrock model examples for mlflow.metrics.genai (#21063, @ManasVardhan)
  • [Docs] Update SSO oidc plugin doc: add google identity platform / AWS cognito / Azure Entra ID configuration guide (#20591, @WeichenXu123)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.10.1

· 4 min read
MLflow maintainers
MLflow maintainers

MLflow 3.10.1 is a patch release that contains some minor feature enhancements, bug fixes, and documentation updates.

Features:

Bug fixes:

  • [UI] Fix "View full dashboard" link in gateway usage tab when workspace is enabled (#21191, @copilot-swe-agent)
  • [UI] Persist AI Gateway default passphrase security banner dismissal to localStorage (#21292, @copilot-swe-agent)
  • [Evaluation] Demote unused parameters log message from WARNING to DEBUG in instructions judge (#21294, @copilot-swe-agent)
  • [UI] Clear "All" time selector when switching to overview tab (#21371, @daniellok-db)
  • [Prompts / UI] Fix Traces view in Prompts tab not being scrollable (#21282, @TomeHirata)
  • [UI] Fix judge builder instruction textarea (#21299, @daniellok-db)
  • [UI] Fix group mode to aggregate "Additional runs" as "Unassigned" group in charts (#21155, @copilot-swe-agent)
  • [UI] Fix artifact download when workspaces are enabled (#21074, @timsolovev)
  • [Tracing] Fix NOT NULL constraint on assessments.trace_id during trace export (#21348, @dbczumar)
  • [Tracking] Fix 403 Forbidden for artifact list via query param when default_permission=NO_PERMISSIONS (#21220, @copilot-swe-agent)
  • [UI] [ML-63097] Fix broken LLM judge documentation links (#21347, @smoorjani)
  • [Tracing] Fix Run Judge failed with litellm.InternalServerError: Invalid response object. (#21262, @PattaraS)
  • [Tracing / UI] Update Action menu: indentation to avoid confusion (#21266, @PattaraS)
  • [Model Registry] Fix MlflowClient.copy_model_version for the case that copy UC model across workspaces (#21212, @WeichenXu123)
  • [UI] Fix empty description box rendering for sanitized-empty experiment descriptions (#21223, @copilot-swe-agent)
  • [Artifacts] Fix single artifact downloading through HttpArtifactRepository (#12955, @Koenkk)
  • [Tracing] Fix find_last_user_message_index skipping skill content injections (#21119, @alkispoly-db)
  • [Tracing] Fix retrieval context extraction when span outputs are stored as strings (#21213, @smoorjani)
  • [UI] Fix visibility toggle button in chart tooltip not working (#21071, @daniellok-db)
  • [UI] Move gateway experiment filtering to server-side query to fix inconsistent page sizes (#21138, @copilot-swe-agent)
  • [Gateway] Downgrade spurious warning to debug log for gateway endpoints with fallback_config but no FALLBACK models (#21123, @copilot-swe-agent)
  • [Tracing] Fix MCP fn_wrapper to pass None for optional params with UNSET defaults (#21051, @yangbaechu)
  • [Tracking] Add CASCADE to logged_model tables experiment_id foreign keys (#20185, @harupy)
  • [Tracing] Fix MCP fn_wrapper handling of Click UNSET defaults (#20953) (#20962, @yangbaechu)

Documentation updates:

  • [Docs] Update SSO oidc plugin doc: add google identity platform / AWS cognito / Azure Entra ID configuration guide (#20591, @WeichenXu123)
  • [Docs / Tracing] Fix distributed tracing rendering and improve doc (#21070, @B-Step62)
  • [Docs] docs: Add single quotes to install commands with extras to prevent zsh errors (#21227, @mshavliuk)
  • [Docs / Model Registry] Fix outdated docstring claiming models:/ URIs are unsupported in register_model (#21197, @copilot-swe-agent)
  • [Docs] Replace MinIO with RustFS in docker-compose setup (#21099, @jmaggesi)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.10.0 Highlights: Multi-Workspace Support, Multi-Turn Evaluation, and many UI Enhancements!

· 5 min read
MLflow maintainers
MLflow maintainers

MLflow 3.10.0 is a major release that enhances MLflow's AI Observability and evaluation capabilities, while also making these features easier to use, both for new users and organizations operating at scale. This release brings multi-workspace support, evaluation and simulation for chatbot conversations, cost tracking for your traces, usage tracking for your AI Gateway endpoints, and a number of UI enhancements to make your apps and agent development much more intuitive.

1. Workspace Support in MLflow Tracking Server

MLflow now supports multi-workspace environments. Users can organize experiments, models, prompts, with a coarser level of granularity and logically isolate them in a single tracking server. To enable this feature, pass the --enable-workspaces flag to the mlflow server command, or set the MLFLOW_ENABLE_WORKSPACES environment variable to true.

Learn more about multi-workspace support

2. Multi-turn Evaluation & Conversation Simulation

MLflow now supports multi-turn evaluation, including evaluating existing conversations with session-level scorers and simulating conversations to test new versions of your agent, without the toil of regenerating conversations. Use the session-level scorers introduced in MLflow 3.8.0 and the brand new session UIs to evaluate the quality of your conversational agents and enable automatic scoring to monitor quality as traces are ingested.

Learn more about multi-turn evaluation

3. Trace Cost Tracking

Gain visibility into your LLM spending! MLflow now automatically extracts model information from LLM spans and calculates costs, with a new UI that renders model and cost data directly in your trace views. Additionally, costs are aggregated and broken down in the "Overview" tab, giving you granular insights into your LLM spend patterns.

Learn more about trace cost tracking

4. Navigation bar redesign

As we continue to add more features to the MLflow UI, we found that navigation was getting cluttered and overwhelming, with poor separation of features for different workflow types. We've redesigned the navigation bar to be more intuitive and easier to use, with a new sidebar that provides a more relevant set of tabs for both GenAI apps and agent developers, as well as classic model training workflows. The new experience also gives more space to the main content area, making it easier to focus on the task at hand.

5. MLflow Demo Experiment

New to MLflow GenAI? With one click, launch a pre-populated demo and explore LLM tracing, evaluation, and prompt management in action. No configuration, no code required. This feature is available in the MLflow UI's homepage, and provides a comprehensive overview of the different functionality that MLflow has to offer.

Get started by clicking the button as shown in the video above, or by running mlflow demo in your terminal.

6. Gateway Usage Tracking

Monitor your AI Gateway endpoints with detailed usage analytics. A new "Usage" tab shows request patterns and metrics, with trace ingestion that links gateway calls back to your experiments for end-to-end AI observability.

To turn this feature on for your AI Gateway endpoints, make sure to check the "Enable usage tracking" toggle in your endpoint settings, as shown in the video above.

Learn more about Gateway usage tracking

7. In-UI Trace Evaluation

Run custom or pre-built LLM judges directly from the traces and sessions UI, no code required! This enables quick evaluation of individual traces and individual without context switching to the Python SDK. In order to use this feature, make sure to set up an AI gateway endpoint, as you'll need to select an endpoint to use when running LLM judges.

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.10.0 to try these new features:

pip install mlflow==3.10.0

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization

· 6 min read
MLflow maintainers
MLflow maintainers

MLflow 3.9.0 is a major release focused on AI Observability and Evaluation capabilities, bringing powerful new features for building, monitoring, and optimizing AI agents. This release introduces an AI-powered assistant, comprehensive dashboards for agent performance, a new judge optimization algorithm, judge builder UI, continuous monitoring with LLM judges, and distributed tracing.

1. MLflow Assistant Powered by Claude Code

MLflow Assistant transforms coding agents like Claude Code into experienced AI engineers by your side. Unlike typical chatbots, the assistant is aware of your codebase and context—it's not just a Q&A tool, but a full-fledged AI engineer that can find root causes for issues, set up quality tests, and apply LLMOps best practices to your project.

Key capabilities include:

  • No additional costs: Use your existing Claude Code subscription. MLflow provides the knowledge and integration at no cost.
  • Context-rich assistance: Understands your local codebase, project structure, and provides tailored recommendations—not generic advice.
  • Complete dev-loop: Goes beyond Q&A to fetch MLflow data, read your code, and add tracing, evaluation, and versioning to your project.
  • Fully customizable: Add custom skills, sub-agents, and permissions. Everything runs on your machine with full transparency.

Open the MLflow UI, navigate to the Assistant panel in any experiment page, and follow the setup wizard to get started.

Learn more about MLflow Assistant

2. Dashboards for Agent Performance Metrics

A new "Overview" tab in GenAI experiments provides pre-built charts and visualizations for monitoring agent performance at a glance. Monitor key metrics like latency, request counts, and quality scores without manual configuration. Identify performance trends and anomalies across your agent deployments, and get tool call summaries to understand how your agents are utilizing available tools.

Navigate to any GenAI experiment and click the "Overview" tab to access the dashboard. Charts are automatically populated based on your trace data. Have a specific visualization need? Request additional charts via GitHub Issues.

Learn more about GenAI Dashboards

3. MemAlign: A New Judge Optimizer Algorithm

MemAlign is a new optimization algorithm for LLM-as-a-judge evaluation that learns evaluation guidelines from past feedback and dynamically retrieves relevant examples at runtime. Improve judge accuracy by learning from human feedback patterns, reduce prompt engineering effort with automatic guideline extraction, and adapt judge behavior dynamically based on the input being evaluated.

Use the MemAlignOptimizer to optimize your judges with historical feedback:

import mlflow
from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import MemAlignOptimizer

# Create a judge
judge = make_judge(
name="politeness",
instructions=(
"Given a user question, evaluate if the chatbot's response is polite and respectful. "
"Consider the tone, language, and context of the response.\n\n"
"Question: {{ inputs }}\n"
"Response: {{ outputs }}"
),
feedback_value_type=bool,
model="openai:/gpt-5-mini",
)

# Create the MemAlign optimizer
optimizer = MemAlignOptimizer(reflection_lm="openai:/gpt-5-mini")

# Retrieve traces with human feedback
traces = mlflow.search_traces(return_type="list")

# Align the judge
aligned_judge = judge.align(traces=traces, optimizer=optimizer)

Learn more about MemAlign

4. Configuring and Building a Judge with Judge Builder UI

A new visual interface lets you create and test custom LLM judge prompts without writing code. Iterate quickly on judge criteria and scoring rubrics with immediate feedback, test judges on sample traces before deploying to production, and export validated judges to the Python SDK for programmatic integration.

Navigate to the "Judges" section in the MLflow UI and click "Create Judge." Define your evaluation criteria, scoring rubric, and test your judge against sample traces. Once satisfied, export the configuration to use with the MLflow SDK.

Learn more about Judge Builder

5. Continuous Online Monitoring with MLflow LLM Judges

Automatically run LLM judges on incoming traces without writing any code, enabling continuous quality monitoring of your agents in production. Detect quality issues in real-time as traces flow through your system, leverage pre-defined judges for common evaluations like safety, relevance, groundedness, and correctness, and get actionable assessments attached directly to your traces.

Go to the "Judges" tab in your experiment, select from pre-defined judges or use your custom judges, and configure which traces to evaluate. Assessments are automatically attached to matching traces as they arrive.

Learn more about Agent Evaluation

6. Distributed Tracing for Tracking End-to-end Requests

Track requests across multiple services with context propagation, enabling end-to-end visibility into distributed AI systems. LLM tracing maintains trace continuity across microservices and external API calls, debug issues that span multiple services with a unified trace view, and understand latency and errors at each step of your distributed pipeline.

Use the get_tracing_context_headers_for_http_request and set_tracing_context_from_http_request_headers functions to inject and extract trace context:

# Service A: Inject context into the headers of the outgoing request
import requests
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root"):
headers = get_tracing_context_headers_for_http_request()
requests.post(
"https://your.service/handle", headers=headers, json={"input": "hello"}
)
# Service B: Extract context from incoming request
import mlflow
from flask import Flask, request
from mlflow.tracing import set_tracing_context_from_http_request_headers

app = Flask(__name__)

@app.post("/handle")
def handle():
headers = dict(request.headers)
with set_tracing_context_from_http_request_headers(headers):
with mlflow.start_span("server-handler") as span:
# ... your logic ...
span.set_attribute("status", "ok")
return {"ok": True}

Learn more about Distributed Tracing

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.9.0 to try these new features:

pip install mlflow==3.9.0

Share Your Feedback

We'd love to hear about your experience with these new features:

Learn More

MLflow 3.8.1

· One min read
MLflow maintainers
MLflow maintainers

MLflow 3.8.1 includes several bug fixes and documentation updates.

Bug fixes:

  • [Tracking] Skip registering sqlalchemy store when sqlalchemy lib is not installed (#19563, @WeichenXu123)
  • [Models / Scoring] fix(security): prevent command injection via malicious model artifacts (#19583, @ColeMurray)
  • [Prompts] Fix prompt registration with model_config on Databricks (#19617, @TomeHirata)
  • [UI] Fix UI blank page on plain HTTP by replacing crypto.randomUUID with uuid library (#19644, @copilot-swe-agent)

Small bug fixes and documentation updates:

#19539, #19451, #19409, @smoorjani; #19493, @alkispoly-db

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.8.0

· 5 min read
MLflow maintainers
MLflow maintainers

MLflow 3.8.0 includes several major features and improvements

Major Features

  • ⚙️Prompt Model Configuration: Prompts can now include model configuration, allowing you to associate specific model settings with prompt templates for more reproducible LLM workflows. (#18963, #19174, #19279, @chenmoneygithub)
  • In-Progress Trace Display: The Traces UI now supports displaying spans from in-progress traces with auto-polling, enabling real-time debugging and monitoring of long-running LLM applications. (#19265, @B-Step62)
  • ⚖️DeepEval and RAGAS Judges Integration: New get_judge API enables using DeepEval and RAGAS evaluation metrics as MLflow scorers, providing access to 20+ evaluation metrics including answer relevancy, faithfulness, and hallucination detection. (#18988, @smoorjani, #19345, @SomtochiUmeh)
  • 🛡️Conversational Safety Scorer: New built-in scorer for evaluating safety of multi-turn conversations, analyzing entire conversation histories for hate speech, harassment, violence, and other safety concerns. (#19106, @joelrobin18)
  • Conversational Tool Call Efficiency Scorer: New built-in scorer for evaluating tool call efficiency in multi-turn agent interactions, detecting redundant calls, missing batching opportunities, and poor tool selections. (#19245, @joelrobin18)

Important Notice

  • Collection of UI Telemetry. From MLflow 3.8.0 onwards, MLflow will collect anonymized data about UI interactions, similar to the telemetry we collect for the Python SDK. If you manage your own server, UI telemetry is automatically disabled by setting the existing environment variables: MLFLOW_DISABLE_TELEMETRY=true or DO_NOT_TRACK=true. If you do not manage your own server (e.g. you use a managed service or are not the admin), you can still opt out personally via the new "Settings" tab in the MLflow UI. For more information, please read the documentation on usage tracking.

Features:

  • [Tracking] Add default passphrase support (#19360, @BenWilson2)
  • [Tracing] Pydantic AI Stream support (#19118, @joelrobin18)
  • [Docs] Deprecate Unity Catalog function integration in AI Gateway (#19457, @harupy)
  • [Tracking] Add --max-results option to mlflow experiments search (#19359, @alkispoly-db)
  • [Tracking] Enhance encryption security (#19253, @BenWilson2)
  • [Tracking] Fix and simplify Gateway store interfaces (#19346, @BenWilson2)
  • [Evaluation] Add inference_params support for LLM Judges (#19152, @debu-sinha)
  • [Tracing] Support batch span export to UC Table (#19324, @B-Step62)
  • [Tracking] Add endpoint tags (#19308, @BenWilson2)
  • [Docs / Evaluation] Add MLFLOW_GENAI_EVAL_MAX_SCORER_WORKERS to limit concurrent scorer execution (#19248, @debu-sinha)
  • [Evaluation / Tracking] Enable search_datasets in Databricks managed MLflow (#19254, @alkispoly-db)
  • [Prompts] render text prompt previews in markdown (#19200, @ispoljari)
  • [UI] Add linked prompts filter for trace search tab (#19192, @TomeHirata)
  • [Evaluation] Automatically wrap async functions when passed to predict_fn (#19249, @smoorjani)
  • [Evaluation] [3/6][builtin judges] Conversational Role Adherence (#19247, @joelrobin18)
  • [Tracking] [Endpoints] [1/x] Add backend DB tables for Endpoints (#19002, @BenWilson2)
  • [Tracking] [Endpoints] [3/x] Entities base definitions (#19004, @BenWilson2)
  • [Tracking] [Endpoints] [4/x] Abstract store interface (#19005, @BenWilson2)
  • [Tracking] [Endpoints] [5/x] SQL Store backend for Endpoints (#19006, @BenWilson2)
  • [Tracking] [Endpoints] [6/x] Protos and entities interfaces (#19007, @BenWilson2)
  • [Tracking] [Endpoints] [7/x] Add rest store implementation (#19008, @BenWilson2)
  • [Tracking] [Endpoints] [8/x] Add credential cache (#19014, @BenWilson2)
  • [Tracking] [Endpoints] [9/x] Add provider, model, and configuration handling (#19009, @BenWilson2)
  • [Evaluation / UI] Add show/hide visibility control for Evaluation runs chart view (#18797) (#18852, @pradpalnis)
  • [Tracking] Add mlflow experiments get command (#19097, @alkispoly-db)
  • [Server-infra] [ Gateway 1/10 ] Simplify secrets and masked secrets with map types (#19440, @BenWilson2)

Bug fixes:

  • [Tracing / UI] Branch 3.8 patch: Fix GraphQL SearchRuns filter using invalid attribute key in trace comparison (#19526, @WeichenXu123)
  • [Scoring / Tracking] Fix artifact download performance regression (#19520, @copilot-swe-agent)
  • [Tracking] Fix SQLAlchemy alias conflict in _search_runs for dataset filters (#19498, @fredericosantos)
  • [Tracking] Add auth support for GraphQL routes (#19278, @BenWilson2)
  • [] Fix SQL injection vulnerability in UC function execution (#19381, @harupy)
  • [UI] Fix MultiIndex column search crash in dataset schema table (#19461, @copilot-swe-agent)
  • [Tracking] Make datasource failures fail gracefully (#19469, @BenWilson2)
  • [Tracing / Tracking] Fix litellm autolog for versions >= 1.78 (#19459, @harupy)
  • [Model Registry / Tracking] Fix SQLAlchemy engine connection pool leak in model registry and job stores (#19386, @harupy)
  • [UI] [Bug fix] Traces UI: Support filtering on assessments with multiple values (e.g. error and boolean) (#19262, @dbczumar)
  • [Evaluation / Tracing] Fix error initialization in Feedback (#19340, @alkispoly-db)
  • [Models] Switch container build to subprocess for Sagemaker (#19277, @BenWilson2)
  • [Scoring] Fix scorers issue on Strands traces (#18835, @joelrobin18)
  • [Tracking] Stop initializing backend stores in artifacts only mode (#19167, @mprahl)
  • [Evaluation] Parallelize multi-turn session evaluation (#19222, @AveshCSingh)
  • [Tracing] Add safe attribute capture for pydantic_ai (#19219, @BenWilson2)
  • [Model Registry] Fix UC to UC copying regression (#19280, @BenWilson2)
  • [Tracking] Fix artifact path traversal vector (#19260, @BenWilson2)
  • [UI] Fix issue with auth controls on system metrics (#19283, @BenWilson2)
  • [Models] Add context loading for ChatModel (#19250, @BenWilson2)
  • [Tracing] Fix trace decorators usage for LangGraph async callers (#19228, @BenWilson2)
  • [Tracking] Update docker compose to use --artifacts-destination not --default-artifact-root (#19215, @B-Step62)
  • [Build] Reduce clint error message verbosity by consolidating README instructions (#19155, @copilot-swe-agent)

Documentation updates:

  • [Docs] Add specific references for correctness scorers (#19472, @BenWilson2)
  • [Docs] Add documentation for Fluency scorer (#19481, @alkispoly-db)
  • [Docs] Update eval quickstart to put all code into a script (#19444, @achen530)
  • [Docs] Add documentation for KnowledgeRetention scorer (#19478, @alkispoly-db)
  • [Evaluation] Fix non-reproducible code examples in deep-learning.mdx (#19376, @saumilyagupta)
  • [Docs / Evaluation] fix: Confusing documentation for mlflow.genai.evaluate() (#19380, @brandonhawi)
  • [Docs] Deprecate model logging of OpenAI flavor (#19325, @TomeHirata)
  • [Docs] Add rounded corners to video elements in documentation (#19231, @copilot-swe-agent)
  • [Docs] Sync Python/TypeScript tab selections in tracing quickstart docs (#19184, @copilot-swe-agent)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.7.0

· 9 min read
MLflow maintainers
MLflow maintainers

MLflow 3.7.0 includes several major features and improvements for GenAI Observability, Evaluation, and Prompt Management.

Major Features

  • 📝 Experiment Prompts UI: New prompts functionality in the experiment UI allows you to manage and search prompts directly within experiments, with support for filter strings and prompt version search in traces. (#19156, #18919, #18906, @TomeHirata)
  • 💬 Multi-turn Evaluation Support: Enhanced mlflow.genai.evaluate now supports multi-turn conversations, enabling comprehensive assessment of conversational AI applications with DataFrame and list inputs. (#18971, @AveshCSingh)
  • ⚖️ Trace Comparison: New side-by-side comparison view in the Traces UI allows you to analyze and debug LLM application behavior across different runs, making it easier to identify regressions and improvements. (#17138, @joelrobin18)
  • 🌐 Gemini TypeScript SDK: Auto-tracing support for Google's Gemini in TypeScript, expanding MLflow's observability capabilities for JavaScript/TypeScript AI applications. (#18207, @joelrobin18)
  • 🎯 Structured Outputs in Judges: The make_judge API now supports structured outputs, enabling more precise and programmatically consumable evaluation results. (#18529, @TomeHirata)
  • 🔗 VoltAgent Tracing: Added auto-tracing support for VoltAgent, extending MLflow's observability to this AI agent framework. (#19041, @joelrobin18)

Breaking Changes

Features

Bug Fixes

Documentation Updates

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.6.0

· 3 min read
MLflow maintainers
MLflow maintainers

MLflow 3.6.0 includes several major features and improvements for AI Observability, Experiment UI, Agent Evaluation and Deployment.

#1: Full OpenTelemetry Support in MLflow Tracking Server

OpenTelemetry Trace Example

MLflow now offers comprehensive OpenTelemetry integration, allowing you to use OpenTelemetry and MLflow seamlessly together for your observability stack.

  • Ingest OpenTelemetry spans directly into the MLflow tracking server
  • Monitor existing applications that are instrumented with OpenTelemetry
  • Choose Arbitrary Languages for your AI applications and trace them, including Java, Go, Rust, and more.
  • Create unified traces that combine MLflow SDK instrumentation with OpenTelemetry auto-instrumentation from third-party libraries

For more information, please check out the blog post for more details.

#2: Session-level View in Trace UI

Session-level View in Trace UI

New chat sessions tab provides a dedicated view for organizing and analyzing related traces at the session level, making it easier to track conversational workflows.

See the Track Users & Sessions guide for more details.

#3: New Supported Frameworks in TypeScript Tracing SDK

Auto-tracing support for Vercel AI SDK, LangChain.js, Mastra, Anthropic SDK, Gemini SDK in TypeScript, expanding MLflow's observability capabilities across popular JavaScript/TypeScript frameworks.

For more information, please check out the TypeScript Tracing SDK.

#4: Tracking Judge Cost and Traces

Comprehensive tracking of LLM judge evaluation costs and traces, providing visibility into evaluation expenses and performance with automatic cost calculation and rendering

See LLM Evaluation Guide for more details.

#5: New experiment tab bar

The experiment tab bar has been fully overhauled to provide more intuitive and discoverable navigation of different features in MLflow.

Upgrade to MLflow 3.6.0 to try it out!

#6: Agent Server for Lightning Agent Deployment

import agent
from mlflow.genai.agent_server import AgentServer

agent_server = AgentServer("ResponsesAgent")
app = agent_server.app

def main():
agent_server.run(app_import_string="start_server:app")

if __name__ == "__main__":
main()
python3 start_server.py

curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{
"input": [{ "role": "user", "content": "What is the 14th Fibonacci number?"}],
"stream": true
}'

New agent server infrastructure for managing and deploying scoring agents with enhanced orchestration capabilities.

See Agent Server Guide for more details.

Breaking Changes and deprecations

  • Drop numbering suffix (_1, _2, ...) from span names (#18531)
  • Deprecate promptflow, pmdarima, and diviner flavors (#18597, #18577)

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.5.1

· 3 min read
MLflow maintainers
MLflow maintainers

MLflow 3.5.1 is a patch release that includes several bug fixes and improvements.

Features:

  • [CLI] Add CLI command to list registered scorers by experiment (#18255, @alkispoly-db)
  • [Deployments] Add configuration option for long-running deployments client requests (#18363, @BenWilson2)
  • [Deployments] Create set_databricks_monitoring_sql_warehouse_id API (#18346, @dbrx-euirim)
  • [Prompts] Show instructions for prompt optimization on prompt registry (#18375, @TomeHirata)

Bug fixes:

Documentation updates:

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.

MLflow 3.5.0

· 8 min read
MLflow maintainers
MLflow maintainers

MLflow 3.5.0 includes several major features and improvements!

Major Features

  • ⚙️ Job Execution Backend: Introduced a new job execution backend infrastructure for running asynchronous tasks with individual execution pools, job search capabilities, and transient error handling. (#17676, #18012, #18070, #18071, #18112, #18049, @WeichenXu123)
  • 🎯 Flexible Prompt Optimization API: Introduced a new flexible API for prompt optimization with support for model switching and the GEPA algorithm, enabling more efficient prompt tuning with fewer rollouts. See the documentation to get started. (#18183, #18031, @TomeHirata)
  • 🎨 Enhanced UI Onboarding: Improved in-product onboarding experience with trace quickstart drawer and updated homepage guidance to help users discover MLflow's latest features. (#18098, #18187, @B-Step62)
  • 🔐 Security Middleware for Tracking Server: Added a security middleware layer to protect against DNS rebinding, CORS attacks, and other security threats. Read the documentation for configuration details. (#17910, @BenWilson2)

Features

Bug Fixes

Documentation Updates

For a comprehensive list of changes, see the release change log, and check out the latest documentation on mlflow.org.