Tracing FireworksAI

FireworksAI is an inference and customization engine for open source AI. It provides day zero access to the latest SOTA OSS models and allows developers to build lightning AI applications.
MLflow Tracing provides automatic tracing capability for FireworksAI through the OpenAI SDK compatibility. FireworksAI is OpenAI SDK compatible, you can use the mlflow.openai.autolog() function to enable automatic tracing. MLflow will capture traces for LLM invocations and log them to the active MLflow Experiment.
MLflow automatically captures the following information about FireworksAI calls:
- Prompts and completion responses
- Latencies
- Model name
- Additional metadata such as
temperature,max_completion_tokens, if specified - Tool Use if returned in the response
- Any exception if raised
Supported APIs
Since FireworksAI is OpenAI SDK compatible, all APIs supported by MLflow's OpenAI integration work seamlessly with FireworksAI. See the model library for a list of available models on FireworksAI.
| Normal | Tool Use | Structured Outputs | Streaming | Async |
|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | ✅ |
Quick Start
import mlflow
import openai
import os
# Enable auto-tracing
mlflow.openai.autolog()
# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("FireworksAI")
# Create an OpenAI client configured for FireworksAI
openai_client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)
# Use the client as usual - traces will be automatically captured
response = openai_client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324", # For other models see: https://fireworks.ai/models
messages=[
{"role": "user", "content": "Why is open source better than closed source?"}
],
)
Chat Completion API Examples
- Basic Example
- Streaming
- Async
- Tool Use
import openai
import mlflow
import os
# Enable auto-tracing
mlflow.openai.autolog()
# Optional: Set a tracking URI and an experiment
# If running locally you can start a server with: `mlflow server --host 127.0.0.1 --port 5000`
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("FireworksAI")
# Configure OpenAI client for FireworksAI
openai_client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)
messages = [
{
"role": "user",
"content": "What is the capital of France?",
}
]
# To use different models check out the model library at: https://fireworks.ai/models
response = openai_client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324",
messages=messages,
max_completion_tokens=100,
)
MLflow Tracing supports streaming API outputs of FireworksAI endpoints through the OpenAI SDK. With the same setup of auto tracing, MLflow automatically traces the streaming response and renders the concatenated output in the span UI. The actual chunks in the response stream can be found in the Event tab as well.
import openai
import mlflow
import os
# Enable trace logging
mlflow.openai.autolog()
client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)
stream = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324",
messages=[
{"role": "user", "content": "How fast would a glass of water freeze on Titan?"}
],
stream=True, # Enable streaming response
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
MLflow Tracing supports asynchronous API returns of FireworksAI through the OpenAI SDK. The usage is the same as the synchronous API.
import openai
import mlflow
import os
# Enable trace logging
mlflow.openai.autolog()
client = openai.AsyncOpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)
response = await client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324",
messages=[{"role": "user", "content": "What is the best open source LLM?"}],
# Async streaming is also supported
# stream=True
)
MLflow Tracing automatically captures tool use responses from FireworksAI models. The function instruction in the response will be highlighted in the trace UI. Moreover, you can annotate the tool function with the @mlflow.trace decorator to create a span for the tool execution.
The following example implements a simple tool use agent using FireworksAI and MLflow Tracing:
import json
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType
import os
client = OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)
# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
@mlflow.trace(span_type=SpanType.TOOL)
def get_weather(city: str) -> str:
if city == "Tokyo":
return "sunny"
elif city == "Paris":
return "rainy"
return "unknown"
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
},
},
}
]
_tool_functions = {"get_weather": get_weather}
# Define a simple tool calling agent
@mlflow.trace(span_type=SpanType.AGENT)
def run_tool_agent(question: str):
messages = [{"role": "user", "content": question}]
# Invoke the model with the given question and available tools
response = client.chat.completions.create(
model="accounts/fireworks/models/gpt-oss-20b",
messages=messages,
tools=tools,
)
ai_msg = response.choices[0].message
messages.append(ai_msg)
# If the model requests tool call(s), invoke the function with the specified arguments
if tool_calls := ai_msg.tool_calls:
for tool_call in tool_calls:
function_name = tool_call.function.name
if tool_func := _tool_functions.get(function_name):
args = json.loads(tool_call.function.arguments)
tool_result = tool_func(**args)
else:
raise RuntimeError("An invalid tool is returned from the assistant!")
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result,
}
)
# Send the tool results to the model and get a new response
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-8b-instruct", messages=messages
)
return response.choices[0].message.content
# Run the tool calling agent
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)
Token Usage
MLflow supports token usage tracking for FireworksAI. The token usage for each LLM call will be logged in the mlflow.chat.tokenUsage attribute. The total token usage throughout the trace will be available in the token_usage field of the trace info object.
import json
import mlflow
mlflow.openai.autolog()
# Run the tool calling agent defined in the previous section
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)
# Get the trace object just created
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f" Input tokens: {total_usage['input_tokens']}")
print(f" Output tokens: {total_usage['output_tokens']}")
print(f" Total tokens: {total_usage['total_tokens']}")
# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
if usage := span.get_attribute("mlflow.chat.tokenUsage"):
print(f"{span.name}:")
print(f" Input tokens: {usage['input_tokens']}")
print(f" Output tokens: {usage['output_tokens']}")
print(f" Total tokens: {usage['total_tokens']}")
== Total token usage: ==
Input tokens: 20
Output tokens: 283
Total tokens: 303
== Detailed usage for each LLM call: ==
Completions:
Input tokens: 20
Output tokens: 283
Total tokens: 303
Disable auto-tracing
Auto tracing for FireworksAI can be disabled globally by calling mlflow.openai.autolog(disable=True) or mlflow.autolog(disable=True).