MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization
MLflow 3.9.0 is a major release focused on AI Observability and Evaluation capabilities, bringing powerful new features for building, monitoring, and optimizing AI agents. This release introduces an AI-powered assistant, comprehensive dashboards for agent performance, a new judge optimization algorithm, judge builder UI, continuous monitoring with LLM judges, and distributed tracing.
1. MLflow Assistant Powered by Claude Code
MLflow Assistant transforms coding agents like Claude Code into experienced AI engineers by your side. Unlike typical chatbots, the assistant is aware of your codebase and context—it's not just a Q&A tool, but a full-fledged AI engineer that can find root causes for issues, set up quality tests, and apply LLMOps best practices to your project.
Key capabilities include:
- No additional costs: Use your existing Claude Code subscription. MLflow provides the knowledge and integration at no cost.
- Context-rich assistance: Understands your local codebase, project structure, and provides tailored recommendations—not generic advice.
- Complete dev-loop: Goes beyond Q&A to fetch MLflow data, read your code, and add tracing, evaluation, and versioning to your project.
- Fully customizable: Add custom skills, sub-agents, and permissions. Everything runs on your machine with full transparency.
Open the MLflow UI, navigate to the Assistant panel in any experiment page, and follow the setup wizard to get started.
Learn more about MLflow Assistant
2. Dashboards for Agent Performance Metrics
A new "Overview" tab in GenAI experiments provides pre-built charts and visualizations for monitoring agent performance at a glance. Monitor key metrics like latency, request counts, and quality scores without manual configuration. Identify performance trends and anomalies across your agent deployments, and get tool call summaries to understand how your agents are utilizing available tools.
Navigate to any GenAI experiment and click the "Overview" tab to access the dashboard. Charts are automatically populated based on your trace data. Have a specific visualization need? Request additional charts via GitHub Issues.
Learn more about GenAI Dashboards
3. MemAlign: A New Judge Optimizer Algorithm
MemAlign is a new optimization algorithm that learns evaluation guidelines from past feedback and dynamically retrieves relevant examples at runtime. Improve judge accuracy by learning from human feedback patterns, reduce prompt engineering effort with automatic guideline extraction, and adapt judge behavior dynamically based on the input being evaluated.
Use the MemAlignOptimizer to optimize your judges with historical feedback:
import mlflow
from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import MemAlignOptimizer
# Create a judge
judge = make_judge(
name="politeness",
instructions=(
"Given a user question, evaluate if the chatbot's response is polite and respectful. "
"Consider the tone, language, and context of the response.\n\n"
"Question: {{ inputs }}\n"
"Response: {{ outputs }}"
),
feedback_value_type=bool,
model="openai:/gpt-5-mini",
)
# Create the MemAlign optimizer
optimizer = MemAlignOptimizer(reflection_lm="openai:/gpt-5-mini")
# Retrieve traces with human feedback
traces = mlflow.search_traces(return_type="list")
# Align the judge
aligned_judge = judge.align(traces=traces, optimizer=optimizer)
4. Configuring and Building a Judge with Judge Builder UI
A new visual interface lets you create and test custom LLM judge prompts without writing code. Iterate quickly on judge criteria and scoring rubrics with immediate feedback, test judges on sample traces before deploying to production, and export validated judges to the Python SDK for programmatic integration.
Navigate to the "Judges" section in the MLflow UI and click "Create Judge." Define your evaluation criteria, scoring rubric, and test your judge against sample traces. Once satisfied, export the configuration to use with the MLflow SDK.
Learn more about Judge Builder
5. Continuous Online Monitoring with MLflow LLM Judges
Automatically run LLM judges on incoming traces without writing any code, enabling continuous quality monitoring of your agents in production. Detect quality issues in real-time as traces flow through your system, leverage pre-defined judges for common evaluations like safety, relevance, groundedness, and correctness, and get actionable assessments attached directly to your traces.
Go to the "Judges" tab in your experiment, select from pre-defined judges or use your custom judges, and configure which traces to evaluate. Assessments are automatically attached to matching traces as they arrive.
Learn more about Agent Evaluation
6. Distributed Tracing for Tracking End-to-end Requests
Track requests across multiple services with context propagation, enabling end-to-end visibility into distributed AI systems. Maintain trace continuity across microservices and external API calls, debug issues that span multiple services with a unified trace view, and understand latency and errors at each step of your distributed pipeline.
Use the get_tracing_context_headers_for_http_request and set_tracing_context_from_http_request_headers functions to inject and extract trace context:
# Service A: Inject context into the headers of the outgoing request
import requests
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request
with mlflow.start_span("client-root"):
headers = get_tracing_context_headers_for_http_request()
requests.post(
"https://your.service/handle", headers=headers, json={"input": "hello"}
)
# Service B: Extract context from incoming request
import mlflow
from flask import Flask, request
from mlflow.tracing import set_tracing_context_from_http_request_headers
app = Flask(__name__)
@app.post("/handle")
def handle():
headers = dict(request.headers)
with set_tracing_context_from_http_request_headers(headers):
with mlflow.start_span("server-handler") as span:
# ... your logic ...
span.set_attribute("status", "ok")
return {"ok": True}
Learn more about Distributed Tracing
Full Changelog
For a comprehensive list of changes, see the release change log.
What's Next
Get Started
Install MLflow 3.9.0 to try these new features:
pip install mlflow==3.9.0
Share Your Feedback
We'd love to hear about your experience with these new features:
- GitHub Issues - Report bugs or request features
- MLflow Roadmap - See what's coming next and share your ideas
- ⭐ Star us on GitHub - Show your support for the project
Learn More
- Join our upcoming webinar to see these features in action
- Check out the MLflow documentation for detailed guides
