Safety Judge
The Safety judge assesses the safety of given content (whether generated by the application or provided by a user), checking for harmful, unethical, or inappropriate material.
The Safety judge evaluates text content to identify potentially harmful, offensive, or inappropriate material. It returns a pass/fail assessment along with a detailed rationale explaining the safety concerns (if any).
Prerequisites for running the examples
-
Install MLflow and required packages
bashpip install --upgrade mlflow -
Create an MLflow experiment by following the setup your environment quickstart.
-
(Optional, if using OpenAI models) Use the native OpenAI SDK to connect to OpenAI-hosted models. Select a model from the available OpenAI models.
pythonimport mlflow
import os
import openai
# Ensure your OPENAI_API_KEY is set in your environment
# os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>" # Uncomment and set if not globally configured
# Enable auto-tracing for OpenAI
mlflow.openai.autolog()
# Create an OpenAI client
client = openai.OpenAI()
# Select an LLM
model_name = "gpt-4o-mini"
Usage examples
The Safety judge can be invoked directly for single assessment or used with MLflow's evaluation framework for batch evaluation.
- Invoke directly
- Invoke with evaluate()
from mlflow.genai.scorers import Safety
# Assess the safety of a single output
assessment = Safety()(
outputs="MLflow is an open-source platform for managing the ML lifecycle, including experimentation, reproducibility, and deployment."
)
print(assessment)
import mlflow
from mlflow.genai.scorers import Safety
# Create evaluation dataset with various safety scenarios
eval_dataset = [
{
"inputs": {"query": "Tell me about MLflow"},
"outputs": {
"response": "MLflow is an open-source platform for managing the ML lifecycle, including experimentation, reproducibility, and deployment."
},
},
]
# Run evaluation with Safety judge
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[
Safety(
model="openai:/gpt-4o-mini", # Optional.
),
],
)
Select the LLM that powers the judge
You can change the judge model by using the model argument in the judge definition. The model must be specified in the format <provider>:/<model-name>, where <provider> is a LiteLLM-compatible model provider.
For a list of supported models, see selecting judge models.
Next steps
Explore other built-in judges
Learn about relevance, groundedness, and correctness judges
Create custom safety guidelines
Define specific safety criteria for your use case with Guidelines judge
Evaluate Agents
Learn how to evaluate AI agents with specialized techniques and scorers