Tracing Amazon Bedrock with MLflow
MLflow supports automatic tracing for Amazon Bedrock, a fully managed service on AWS that provides high-performing foundations from leading AI provides such as Anthropic, Cohere, Meta, Mistral AI, and more. By enabling auto tracing for Amazon Bedrock by calling the mlflow.bedrock.autolog()
function, MLflow will capture traces for LLM invocation and logged them to the active MLflow Experiment.
import mlflow
mlflow.bedrock.autolog()
MLflow trace automatically captures the following information about Amazon Bedrock calls:
Prompts and completion responses
Latencies
Model name
Additional metadata such as temperature, max_tokens, if specified.
Function calling if returned in the response
Any exception if raised
Basic Example
import boto3
import mlflow
# Enable auto-tracing for Amazon Bedrock
mlflow.bedrock.autolog()
mlflow.set_experiment("Bedrock")
# Create a boto3 client for invoking the Bedrock API
bedrock = boto3.client(
service_name="bedrock-runtime",
region_name="<REPLACE_WITH_YOUR_AWS_REGION>",
)
# MLflow will log a trace for Bedrock API call in the "Bedrock" experiment created above
response = bedrock.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{
"role": "user",
"content": "Describe the purpose of a 'hello world' program in one line.",
}
],
inferenceConfig={
"maxTokens": 512,
"temperature": 0.1,
"topP": 0.9,
},
)
The logged trace, associated with the Bedrock
experiment, can be seen in the MLflow UI.
Raw Inputs and Outputs
By default, MLflow renders the rich chat-like UI for input and output messages in the Chat
tab. To view the raw input and output payload, including configuration parameters, click on the Inputs / Outputs
tab in the UI.
Note
The Chat
panel is only supported for the converse
and converse_stream
APIs. For the other APIs, MLflow only displays the Inputs / Outputs
tab.
Streaming
MLflow supports tracing streaming calls to Amazon Bedrock APIs. The generated trace shows the aggregated output message in the Chat
tab, while the individual chunks are displayed in the Events
tab.
response = bedrock.converse_stream(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{
"role": "user",
"content": [
{"text": "Describe the purpose of a 'hello world' program in one line."}
],
}
],
inferenceConfig={
"maxTokens": 300,
"temperature": 0.1,
"topP": 0.9,
},
)
for chunk in response["stream"]:
print(chunk)
Attention
MLflow does not create a span immediately when the streaming response is returned. Instead, it creates a span when the streaming chunks are consumed, for example, the for-loop in the code snippet above.
Function Calling Agent
MLflow Tracing automatically captures function calling metadata when calling Amazon Bedrock APIs. The function definition and instruction in the response will be highlighted in the Chat
tab on trace UI.
Combining this with the manual tracing feature, you can define a function-calling agent (ReAct) and trace its execution. The entire agent implementation might look complicated, but the tracing part is pretty straightforward: (1) add the @mlflow.trace decorator to functions to trace and (2) enable auto-tracing for Amazon Bedrock with mlflow.bedrock.autolog(). MLflow will take care of the complexity such as resolving call chains and recording execution metadata.
import boto3
import mlflow
from mlflow.entities import SpanType
# Enable auto-tracing for Amazon Bedrock
mlflow.bedrock.autolog()
mlflow.set_experiment("Bedrock")
# Create a boto3 client for invoking the Bedrock API
bedrock = boto3.client(
service_name="bedrock-runtime",
region_name="<REPLACE_WITH_YOUR_AWS_REGION>",
)
model_id = "anthropic.claude-3-5-sonnet-20241022-v2:0"
# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
@mlflow.trace(span_type=SpanType.TOOL)
def get_weather(city: str) -> str:
""" "Get the current weather in a given location"""
return "sunny" if city == "San Francisco, CA" else "unknown"
# Define the tool configuration passed to Bedrock
tools = [
{
"toolSpec": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA",
},
},
"required": ["city"],
}
},
}
}
]
tool_functions = {"get_weather": get_weather}
# Define a simple tool calling agent
@mlflow.trace(span_type=SpanType.AGENT)
def run_tool_agent(question: str) -> str:
messages = [{"role": "user", "content": [{"text": question}]}]
# Invoke the model with the given question and available tools
response = bedrock.converse(
modelId=model_id,
messages=messages,
toolConfig={"tools": tools},
)
assistant_message = response["output"]["message"]
messages.append(assistant_message)
# If the model requests tool call(s), invoke the function with the specified arguments
tool_use = next(
(c["toolUse"] for c in assistant_message["content"] if "toolUse" in c), None
)
if tool_use:
tool_func = tool_functions[tool_use["name"]]
tool_result = tool_func(**tool_use["input"])
messages.append(
{
"role": "user",
"content": [
{
"toolResult": {
"toolUseId": tool_use["toolUseId"],
"content": [{"text": tool_result}],
}
}
],
}
)
# Send the tool results to the model and get a new response
response = bedrock.converse(
modelId=model_id,
messages=messages,
toolConfig={"tools": tools},
)
return response["output"]["message"]["content"][0]["text"]
# Run the tool calling agent
question = "What's the weather like in San Francisco today?"
answer = run_tool_agent(question)
Executing the code above will create a single trace that involves all LLM invocations and the tool calls.