Skip to main content

Export MLflow Traces/Metrics via OTLP

Set Up OTLP Exporter

Traces generated by MLflow are compatible with the OpenTelemetry trace spec. Therefore, MLflow traces can be exported to various observability platforms that support OpenTelemetry.

By default, MLflow exports traces to the MLflow Tracking Server. To export traces to an OpenTelemetry Collector, set the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT environment variable before starting any trace. You can also enable dual export to send traces to both MLflow and an OpenTelemetry-compatible backend simultaneously.

bash
pip install opentelemetry-exporter-otlp
python
import mlflow
import os

# Set the endpoint of the OpenTelemetry Collector
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"
# Optionally, set the service name to group traces
os.environ["OTEL_SERVICE_NAME"] = "your-service-name"

# Trace will be exported to the OTel collector
with mlflow.start_span(name="foo") as span:
span.set_inputs({"a": 1})
span.set_outputs({"b": 2})

OpenTelemetry Configuration

MLflow uses the standard OTLP exporter for exporting traces to OpenTelemetry Collector instances. You can use all of the configuration options supported by OpenTelemetry:

bash
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317/v1/traces"
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_TRACES_HEADERS="api_key=12345"

Integrated Observability Platforms

Click on the following icons to learn more about how to set up OpenTelemetry exporter for your specific observability platform:

Datadog Logo
NewRelic Logo
Signoz Logo
Splunk Logo
Grafana Logo
Jaeger Logo
Dynatrace Logo
ServiceNow Logo

Dual Export

By default, when OTLP export is configured, MLflow sends traces only to the OpenTelemetry Collector. To send traces to both MLflow Tracking Server and OpenTelemetry Collector simultaneously, set MLFLOW_TRACE_ENABLE_OTLP_DUAL_EXPORT=true:

python
import mlflow
import os

# Enable dual export
os.environ["MLFLOW_TRACE_ENABLE_OTLP_DUAL_EXPORT"] = "true"
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"

# Configure MLflow tracking
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my-experiment")

# Traces will be sent to both MLflow and the OpenTelemetry Collector
with mlflow.start_span(name="foo") as span:
span.set_inputs({"a": 1})
span.set_outputs({"b": 2})

Metrics Export

MLflow can export OpenTelemetry metrics when a metrics endpoint is configured. This allows you to monitor span durations and other trace-related metrics in compatible monitoring systems.

Prerequisites: The opentelemetry-exporter-otlp library must be installed to enable metrics export:

bash
pip install opentelemetry-exporter-otlp

To enable metrics export:

Configure OpenTelemetry metrics endpoint:

bash
# For OpenTelemetry Collector (gRPC endpoint)
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://localhost:4317"
export OTEL_EXPORTER_OTLP_METRICS_PROTOCOL="grpc"

# OR for OpenTelemetry Collector (HTTP endpoint)
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://localhost:4318/v1/metrics"
export OTEL_EXPORTER_OTLP_METRICS_PROTOCOL="http/protobuf"

Direct Prometheus Export

Prometheus can directly receive OpenTelemetry metrics exported by MLflow:

bash
# Configure MLflow to send metrics directly to Prometheus
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://localhost:9090/api/v1/otlp/v1/metrics"
export OTEL_EXPORTER_OTLP_METRICS_PROTOCOL="http/protobuf"

Prometheus configuration: Start Prometheus with --web.enable-otlp-receiver and --enable-feature=otlp-deltatocumulative flags to accept OTLP metrics directly.

Exported Metrics

When enabled, MLflow exports the following OpenTelemetry histogram metric:

  • mlflow.trace.span.duration: A histogram measuring span execution duration in milliseconds
    • Unit: ms (milliseconds)
    • Labels/Attributes:
      • root: "true" for root spans, "false" for child spans
      • span_type: The type of span (e.g., "LLM", "CHAIN", "AGENT", or "unknown")
      • span_status: The span status ("OK", "ERROR", or "UNSET")
      • experiment_id: The MLflow experiment ID associated with the trace
      • tags.*: All trace tags (e.g., tags.mlflow.traceName, tags.mlflow.evalRequestId)
      • metadata.*: All trace metadata (e.g., metadata.mlflow.sourceRun, metadata.mlflow.modelId, metadata.mlflow.trace.tokenUsage)

This histogram allows you to analyze:

  • Response time distributions across different span types
  • Performance differences between root spans and child spans
  • Error rates by monitoring spans with "ERROR" status
  • Performance metrics grouped by MLflow experiment
  • Metrics segmented by trace tags (e.g., tags.mlflow.traceName, tags.mlflow.evalRequestId)
  • Performance analysis by model ID or source run (e.g., metadata.mlflow.modelId, metadata.mlflow.sourceRun)
  • Service performance trends over time

Complete Example

python
import mlflow
import os

# Enable metrics collection and export
os.environ["OTEL_EXPORTER_OTLP_METRICS_ENDPOINT"] = "http://localhost:4317"
os.environ["OTEL_EXPORTER_OTLP_METRICS_PROTOCOL"] = "grpc"

# Metrics will be exported to OpenTelemetry Collector
with mlflow.start_span(name="process_request", span_type="CHAIN") as span:
span.set_inputs({"query": "What is MLflow?"})
# Your application logic here
span.set_outputs({"response": "MLflow is an open source platform..."})