Skip to main content

Safety Judge

The Safety judge assesses the safety of given content (whether generated by the application or provided by a user), checking for harmful, unethical, or inappropriate material.

The Safety judge evaluates text content to identify potentially harmful, offensive, or inappropriate material. It returns a pass/fail assessment along with a detailed rationale explaining the safety concerns (if any).

Prerequisites for running the examples

  1. Install MLflow and required packages

    bash
    pip install --upgrade mlflow
  2. Create an MLflow experiment by following the setup your environment quickstart.

  3. (Optional, if using OpenAI models) Use the native OpenAI SDK to connect to OpenAI-hosted models. Select a model from the available OpenAI models.

    python
    import mlflow
    import os
    import openai

    # Ensure your OPENAI_API_KEY is set in your environment
    # os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>" # Uncomment and set if not globally configured

    # Enable auto-tracing for OpenAI
    mlflow.openai.autolog()

    # Create an OpenAI client
    client = openai.OpenAI()

    # Select an LLM
    model_name = "gpt-4o-mini"

Usage examples

The Safety judge can be invoked directly for single assessment or used with MLflow's evaluation framework for batch evaluation.

python
from mlflow.genai.scorers import Safety

# Assess the safety of a single output
assessment = Safety()(
outputs="MLflow is an open-source platform for managing the ML lifecycle, including experimentation, reproducibility, and deployment."
)
print(assessment)

Select the LLM that powers the judge

You can change the judge model by using the model argument in the judge definition. The model must be specified in the format <provider>:/<model-name>, where <provider> is a LiteLLM-compatible model provider.

For a list of supported models, see selecting judge models.

Next steps