MLflow 3.4.0rc0
ยท 4 min read
MLflow 3.4.0rc0 is a release candidate for 3.4.0. To install, run the following command:
pip install mlflow==3.4.0rc0
MLflow 3.4.0rc0 includes several major features and improvements
Major New Featuresโ
- ๐ OpenTelemetry Metrics Export: MLflow now exports span-level statistics as OpenTelemetry metrics, providing enhanced observability and monitoring capabilities for traced applications. (#17325, @dbczumar)
- ๐ค MCP Server Integration: Introducing the Model Context Protocol (MCP) server for MLflow, enabling AI assistants and LLMs to interact with MLflow programmatically. (#17122, @harupy)
- ๐งโโ๏ธ Custom Judges API: New
make_judge
API enables creation of custom evaluation judges for assessing LLM outputs with domain-specific criteria. (#17647, @BenWilson2, @dbczumar, @alkispoly-db, @smoorjani) - ๐ Correlations Backend: Implemented backend infrastructure for storing and computing correlations between experiment metrics using NPMI (Normalized Pointwise Mutual Information). (#17309, #17368, @BenWilson2)
- ๐๏ธ Evaluation Datasets: MLflow now supports storing and versioning evaluation datasets directly within experiments for reproducible model assessment. (#17447, @BenWilson2)
- ๐ Databricks Backend for MLflow Server: MLflow server can now use Databricks as a backend, enabling seamless integration with Databricks workspaces. (#17411, @nsthorat)
- ๐ค Claude Autologging: Automatic tracing support for Claude AI interactions, capturing conversations and model responses. (#17305, @smoorjani)
- ๐ Strands Agent Tracing: Added comprehensive tracing support for Strands agents, including automatic instrumentation for agent workflows and interactions. (#17151, @joelrobin18)
Features:
- [Evaluation] Add ability to pass tags via dataframe in mlflow.genai.evaluate (#17549, @smoorjani)
- [Evaluation] Add custom judge model support for Safety and RetrievalRelevance builtin scorers (#17526, @dbrx-euirim)
- [Tracing] Add AI commands as MCP prompts for LLM interaction (#17608, @nsthorat)
- [Tracing] Add MLFLOW_ENABLE_OTLP_EXPORTER environment variable (#17505, @dbczumar)
- [Tracing] Support OTel and MLflow dual export (#17187, @dbczumar)
- [Tracing] Make set_destination use ContextVar for thread safety (#17219, @B-Step62)
- [CLI] Add MLflow commands CLI for exposing prompt commands to LLMs (#17530, @nsthorat)
- [CLI] Add 'mlflow runs link-traces' command (#17444, @nsthorat)
- [CLI] Add 'mlflow runs create' command for programmatic run creation (#17417, @nsthorat)
- [CLI] Add MLflow traces CLI command with comprehensive search and management capabilities (#17302, @nsthorat)
- [CLI] Add --env-file flag to all MLflow CLI commands (#17509, @nsthorat)
- [Tracking] Backend for storing scorers in MLflow experiments (#17090, @WeichenXu123)
- [Model Registry] Allow cross-workspace copying of model versions between WMR and UC (#17458, @arpitjasa-db)
- [Models] Add automatic Git-based model versioning for GenAI applications (#17076, @harupy)
- [Models] Improve WheeledModel._download_wheels safety (#17004, @serena-ruan)
- [Projects] Support resume run for Optuna hyperparameter optimization (#17191, @lu-wang-dl)
- [Scoring] Add MLFLOW_DEPLOYMENT_CLIENT_HTTP_REQUEST_TIMEOUT environment variable (#17252, @dbczumar)
- [UI] Add ability to hide/unhide all finished runs in Chart view (#17143, @joelrobin18)
- [Telemetry] Add MLflow OSS telemetry for invoke_custom_judge_model (#17585, @dbrx-euirim)
Bug fixes:
- [Evaluation] Implement DSPy LM interface for default Databricks model serving (#17672, @smoorjani)
- [Evaluation] Fix aggregations incorrectly applied to legacy scorer interface (#17596, @BenWilson2)
- [Evaluation] Add Unity Catalog table source support for mlflow.evaluate (#17546, @BenWilson2)
- [Evaluation] Fix custom prompt judge encoding issues with custom judge models (#17584, @dbrx-euirim)
- [Tracking] Fix OpenAI autolog to properly reconstruct Response objects from streaming events (#17535, @WeichenXu123)
- [Tracking] Add basic authentication support in TypeScript SDK (#17436, @kevin-lyn)
- [Tracking] Update scorer endpoints to v3.0 API specification (#17409, @WeichenXu123)
- [Tracking] Fix scorer status handling in MLflow tracking backend (#17379, @WeichenXu123)
- [Tracking] Fix missing source-run information in UI (#16682, @WeichenXu123)
- [Scoring] Fix spark_udf to always use stdin_serve for model serving (#17580, @WeichenXu123)
- [Scoring] Fix a bug with Spark UDF usage of uv as an environment manager (#17489, @WeichenXu123)
- [Model Registry] Extract source workspace ID from run_link during model version migration (#17600, @arpitjasa-db)
- [Models] Improve security by reducing write permissions in temporary directory creation (#17544, @BenWilson2)
- [Server-infra] Fix --env-file flag compatibility with --dev mode (#17615, @nsthorat)
- [Server-infra] Fix basic authentication with Uvicorn server (#17523, @kevin-lyn)
- [UI] Fix experiment comparison functionality in UI (#17550, @Flametaa)
- [UI] Fix compareExperimentsSearch route definitions (#17459, @WeichenXu123)
Documentation updates:
- [Docs] Add clarification for trace requirements in scorers documentation (#17542, @BenWilson2)
- [Docs] Add documentation for Claude code autotracing (#17521, @smoorjani)
- [Docs] Remove experimental status message for MPU/MPD features (#17486, @BenWilson2)
- [Docs] Remove problematic pages from documentation (#17453, @BenWilson2)
- [Docs] Add documentation for updating signatures on Databricks registered models (#17450, @arpitjasa-db)
- [Docs] Update Scorers API documentation (#17298, @WeichenXu123)
- [Docs] Add comprehensive documentation for scorers (#17258, @B-Step62)
Please try it out and report any issues on the issue tracker.