MLflow

MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization

January 30, 2026 · 6 min read

MLflow 3.9.0 is a major release focused on AI Observability and Evaluation capabilities, bringing powerful new features for building, monitoring, and optimizing AI agents. This release introduces an AI-powered assistant, comprehensive dashboards for agent performance, a new judge optimization algorithm, judge builder UI, continuous monitoring with LLM judges, and distributed tracing.

1. MLflow Assistant Powered by Claude Code

MLflow Assistant transforms coding agents like Claude Code into experienced AI engineers by your side. Unlike typical chatbots, the assistant is aware of your codebase and context—it's not just a Q&A tool, but a full-fledged AI engineer that can find root causes for issues, set up quality tests, and apply LLMOps best practices to your project.

Key capabilities include:

No additional costs: Use your existing Claude Code subscription. MLflow provides the knowledge and integration at no cost.
Context-rich assistance: Understands your local codebase, project structure, and provides tailored recommendations—not generic advice.
Complete dev-loop: Goes beyond Q&A to fetch MLflow data, read your code, and add tracing, evaluation, and versioning to your project.
Fully customizable: Add custom skills, sub-agents, and permissions. Everything runs on your machine with full transparency.

Open the MLflow UI, navigate to the Assistant panel in any experiment page, and follow the setup wizard to get started.

Learn more about MLflow Assistant

2. Dashboards for Agent Performance Metrics

A new "Overview" tab in GenAI experiments provides pre-built charts and visualizations for monitoring agent performance at a glance. Monitor key metrics like latency, request counts, and quality scores without manual configuration. Identify performance trends and anomalies across your agent deployments, and get tool call summaries to understand how your agents are utilizing available tools.

Navigate to any GenAI experiment and click the "Overview" tab to access the dashboard. Charts are automatically populated based on your trace data. Have a specific visualization need? Request additional charts via GitHub Issues.

Learn more about GenAI Dashboards

3. MemAlign: A New Judge Optimizer Algorithm

MemAlign is a new optimization algorithm that learns evaluation guidelines from past feedback and dynamically retrieves relevant examples at runtime. Improve judge accuracy by learning from human feedback patterns, reduce prompt engineering effort with automatic guideline extraction, and adapt judge behavior dynamically based on the input being evaluated.

Use the MemAlignOptimizer to optimize your judges with historical feedback:

import mlflow
from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import MemAlignOptimizer

# Create a judge
judge = make_judge(
    name="politeness",
    instructions=(
        "Given a user question, evaluate if the chatbot's response is polite and respectful. "
        "Consider the tone, language, and context of the response.\n\n"
        "Question: {{ inputs }}\n"
        "Response: {{ outputs }}"
    ),
    feedback_value_type=bool,
    model="openai:/gpt-5-mini",
)

# Create the MemAlign optimizer
optimizer = MemAlignOptimizer(reflection_lm="openai:/gpt-5-mini")

# Retrieve traces with human feedback
traces = mlflow.search_traces(return_type="list")

# Align the judge
aligned_judge = judge.align(traces=traces, optimizer=optimizer)

Learn more about MemAlign

4. Configuring and Building a Judge with Judge Builder UI

A new visual interface lets you create and test custom LLM judge prompts without writing code. Iterate quickly on judge criteria and scoring rubrics with immediate feedback, test judges on sample traces before deploying to production, and export validated judges to the Python SDK for programmatic integration.

Navigate to the "Judges" section in the MLflow UI and click "Create Judge." Define your evaluation criteria, scoring rubric, and test your judge against sample traces. Once satisfied, export the configuration to use with the MLflow SDK.

Learn more about Judge Builder

5. Continuous Online Monitoring with MLflow LLM Judges

Automatically run LLM judges on incoming traces without writing any code, enabling continuous quality monitoring of your agents in production. Detect quality issues in real-time as traces flow through your system, leverage pre-defined judges for common evaluations like safety, relevance, groundedness, and correctness, and get actionable assessments attached directly to your traces.

Go to the "Judges" tab in your experiment, select from pre-defined judges or use your custom judges, and configure which traces to evaluate. Assessments are automatically attached to matching traces as they arrive.

Learn more about Agent Evaluation

6. Distributed Tracing for Tracking End-to-end Requests

Track requests across multiple services with context propagation, enabling end-to-end visibility into distributed AI systems. Maintain trace continuity across microservices and external API calls, debug issues that span multiple services with a unified trace view, and understand latency and errors at each step of your distributed pipeline.

Use the get_tracing_context_headers_for_http_request and set_tracing_context_from_http_request_headers functions to inject and extract trace context:

# Service A: Inject context into the headers of the outgoing request
import requests
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root"):
    headers = get_tracing_context_headers_for_http_request()
    requests.post(
        "https://your.service/handle", headers=headers, json={"input": "hello"}
    )

# Service B: Extract context from incoming request
import mlflow
from flask import Flask, request
from mlflow.tracing import set_tracing_context_from_http_request_headers

app = Flask(__name__)

@app.post("/handle")
def handle():
    headers = dict(request.headers)
    with set_tracing_context_from_http_request_headers(headers):
        with mlflow.start_span("server-handler") as span:
            # ... your logic ...
            span.set_attribute("status", "ok")
    return {"ok": True}

Learn more about Distributed Tracing

Full Changelog

For a comprehensive list of changes, see the release change log.

What's Next

Get Started

Install MLflow 3.9.0 to try these new features:

pip install mlflow==3.9.0

We'd love to hear about your experience with these new features:

GitHub Issues - Report bugs or request features
MLflow Roadmap - See what's coming next and share your ideas
⭐ Star us on GitHub - Show your support for the project

Learn More

Join our upcoming webinar to see these features in action
Check out the MLflow documentation for detailed guides

GenAI Apps & Agents

Model Training

GenAI Apps & Agents

Model Training

MLflow 3.9.0 Highlights: AI Assistant, Dashboards, and Judge Optimization

1. MLflow Assistant Powered by Claude Code

2. Dashboards for Agent Performance Metrics

3. MemAlign: A New Judge Optimizer Algorithm

4. Configuring and Building a Judge with Judge Builder UI

5. Continuous Online Monitoring with MLflow LLM Judges

6. Distributed Tracing for Tracking End-to-end Requests

Full Changelog

What's Next

Get Started

Learn More

GenAI Apps & Agents

Model Training

GenAI Apps & Agents

Model Training

1. MLflow Assistant Powered by Claude Code​

2. Dashboards for Agent Performance Metrics​

3. MemAlign: A New Judge Optimizer Algorithm​

4. Configuring and Building a Judge with Judge Builder UI​

5. Continuous Online Monitoring with MLflow LLM Judges​

6. Distributed Tracing for Tracking End-to-end Requests​

Full Changelog​

What's Next​

Get Started​

Share Your Feedback​

Learn More​

1. MLflow Assistant Powered by Claude Code

2. Dashboards for Agent Performance Metrics

3. MemAlign: A New Judge Optimizer Algorithm

4. Configuring and Building a Judge with Judge Builder UI

5. Continuous Online Monitoring with MLflow LLM Judges

6. Distributed Tracing for Tracking End-to-end Requests

Full Changelog

What's Next

Get Started

Share Your Feedback

Learn More