Skip to main content

MLflow 3.4.0rc0

ยท 4 min read
MLflow maintainers
MLflow maintainers

MLflow 3.4.0rc0 is a release candidate for 3.4.0. To install, run the following command:

pip install mlflow==3.4.0rc0

MLflow 3.4.0rc0 includes several major features and improvements

Major New Featuresโ€‹

  • ๐Ÿ“Š OpenTelemetry Metrics Export: MLflow now exports span-level statistics as OpenTelemetry metrics, providing enhanced observability and monitoring capabilities for traced applications. (#17325, @dbczumar)
  • ๐Ÿค– MCP Server Integration: Introducing the Model Context Protocol (MCP) server for MLflow, enabling AI assistants and LLMs to interact with MLflow programmatically. (#17122, @harupy)
  • ๐Ÿง‘โ€โš–๏ธ Custom Judges API: New make_judge API enables creation of custom evaluation judges for assessing LLM outputs with domain-specific criteria. (#17647, @BenWilson2, @dbczumar, @alkispoly-db, @smoorjani)
  • ๐Ÿ“ˆ Correlations Backend: Implemented backend infrastructure for storing and computing correlations between experiment metrics using NPMI (Normalized Pointwise Mutual Information). (#17309, #17368, @BenWilson2)
  • ๐Ÿ—‚๏ธ Evaluation Datasets: MLflow now supports storing and versioning evaluation datasets directly within experiments for reproducible model assessment. (#17447, @BenWilson2)
  • ๐Ÿ”— Databricks Backend for MLflow Server: MLflow server can now use Databricks as a backend, enabling seamless integration with Databricks workspaces. (#17411, @nsthorat)
  • ๐Ÿค– Claude Autologging: Automatic tracing support for Claude AI interactions, capturing conversations and model responses. (#17305, @smoorjani)
  • ๐ŸŒŠ Strands Agent Tracing: Added comprehensive tracing support for Strands agents, including automatic instrumentation for agent workflows and interactions. (#17151, @joelrobin18)

Features:

  • [Evaluation] Add ability to pass tags via dataframe in mlflow.genai.evaluate (#17549, @smoorjani)
  • [Evaluation] Add custom judge model support for Safety and RetrievalRelevance builtin scorers (#17526, @dbrx-euirim)
  • [Tracing] Add AI commands as MCP prompts for LLM interaction (#17608, @nsthorat)
  • [Tracing] Add MLFLOW_ENABLE_OTLP_EXPORTER environment variable (#17505, @dbczumar)
  • [Tracing] Support OTel and MLflow dual export (#17187, @dbczumar)
  • [Tracing] Make set_destination use ContextVar for thread safety (#17219, @B-Step62)
  • [CLI] Add MLflow commands CLI for exposing prompt commands to LLMs (#17530, @nsthorat)
  • [CLI] Add 'mlflow runs link-traces' command (#17444, @nsthorat)
  • [CLI] Add 'mlflow runs create' command for programmatic run creation (#17417, @nsthorat)
  • [CLI] Add MLflow traces CLI command with comprehensive search and management capabilities (#17302, @nsthorat)
  • [CLI] Add --env-file flag to all MLflow CLI commands (#17509, @nsthorat)
  • [Tracking] Backend for storing scorers in MLflow experiments (#17090, @WeichenXu123)
  • [Model Registry] Allow cross-workspace copying of model versions between WMR and UC (#17458, @arpitjasa-db)
  • [Models] Add automatic Git-based model versioning for GenAI applications (#17076, @harupy)
  • [Models] Improve WheeledModel._download_wheels safety (#17004, @serena-ruan)
  • [Projects] Support resume run for Optuna hyperparameter optimization (#17191, @lu-wang-dl)
  • [Scoring] Add MLFLOW_DEPLOYMENT_CLIENT_HTTP_REQUEST_TIMEOUT environment variable (#17252, @dbczumar)
  • [UI] Add ability to hide/unhide all finished runs in Chart view (#17143, @joelrobin18)
  • [Telemetry] Add MLflow OSS telemetry for invoke_custom_judge_model (#17585, @dbrx-euirim)

Bug fixes:

  • [Evaluation] Implement DSPy LM interface for default Databricks model serving (#17672, @smoorjani)
  • [Evaluation] Fix aggregations incorrectly applied to legacy scorer interface (#17596, @BenWilson2)
  • [Evaluation] Add Unity Catalog table source support for mlflow.evaluate (#17546, @BenWilson2)
  • [Evaluation] Fix custom prompt judge encoding issues with custom judge models (#17584, @dbrx-euirim)
  • [Tracking] Fix OpenAI autolog to properly reconstruct Response objects from streaming events (#17535, @WeichenXu123)
  • [Tracking] Add basic authentication support in TypeScript SDK (#17436, @kevin-lyn)
  • [Tracking] Update scorer endpoints to v3.0 API specification (#17409, @WeichenXu123)
  • [Tracking] Fix scorer status handling in MLflow tracking backend (#17379, @WeichenXu123)
  • [Tracking] Fix missing source-run information in UI (#16682, @WeichenXu123)
  • [Scoring] Fix spark_udf to always use stdin_serve for model serving (#17580, @WeichenXu123)
  • [Scoring] Fix a bug with Spark UDF usage of uv as an environment manager (#17489, @WeichenXu123)
  • [Model Registry] Extract source workspace ID from run_link during model version migration (#17600, @arpitjasa-db)
  • [Models] Improve security by reducing write permissions in temporary directory creation (#17544, @BenWilson2)
  • [Server-infra] Fix --env-file flag compatibility with --dev mode (#17615, @nsthorat)
  • [Server-infra] Fix basic authentication with Uvicorn server (#17523, @kevin-lyn)
  • [UI] Fix experiment comparison functionality in UI (#17550, @Flametaa)
  • [UI] Fix compareExperimentsSearch route definitions (#17459, @WeichenXu123)

Documentation updates:

Please try it out and report any issues on the issue tracker.