Skip to main content

Tracing FireworksAI

FireworksAI Tracing via autolog

FireworksAI is an inference and customization engine for open source AI. It provides day zero access to the latest SOTA OSS models and allows developers to build lightning AI applications.

MLflow Tracing provides automatic tracing capability for FireworksAI through the OpenAI SDK compatibility. FireworksAI is OpenAI SDK compatible, you can use the mlflow.openai.autolog() function to enable automatic tracing. MLflow will capture traces for LLM invocations and log them to the active MLflow Experiment.

MLflow automatically captures the following information about FireworksAI calls:

  • Prompts and completion responses
  • Latencies
  • Model name
  • Additional metadata such as temperature, max_completion_tokens, if specified
  • Tool Use if returned in the response
  • Any exception if raised

Supported APIs

Since FireworksAI is OpenAI SDK compatible, all APIs supported by MLflow's OpenAI integration work seamlessly with FireworksAI. See the model library for a list of available models on FireworksAI.

NormalTool UseStructured OutputsStreamingAsync

Quick Start

python
import mlflow
import openai
import os

# Enable auto-tracing
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("FireworksAI")

# Create an OpenAI client configured for FireworksAI
openai_client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)

# Use the client as usual - traces will be automatically captured
response = openai_client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324", # For other models see: https://fireworks.ai/models
messages=[
{"role": "user", "content": "Why is open source better than closed source?"}
],
)

Chat Completion API Examples

python
import openai
import mlflow
import os

# Enable auto-tracing
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
# If running locally you can start a server with: `mlflow server --host 127.0.0.1 --port 5000`
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("FireworksAI")

# Configure OpenAI client for FireworksAI
openai_client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)

messages = [
{
"role": "user",
"content": "What is the capital of France?",
}
]

# To use different models check out the model library at: https://fireworks.ai/models
response = openai_client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324",
messages=messages,
max_completion_tokens=100,
)

Token Usage

MLflow supports token usage tracking for FireworksAI. The token usage for each LLM call will be logged in the mlflow.chat.tokenUsage attribute. The total token usage throughout the trace will be available in the token_usage field of the trace info object.

python
import json
import mlflow

mlflow.openai.autolog()

# Run the tool calling agent defined in the previous section
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)

# Get the trace object just created
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f" Input tokens: {total_usage['input_tokens']}")
print(f" Output tokens: {total_usage['output_tokens']}")
print(f" Total tokens: {total_usage['total_tokens']}")

# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
if usage := span.get_attribute("mlflow.chat.tokenUsage"):
print(f"{span.name}:")
print(f" Input tokens: {usage['input_tokens']}")
print(f" Output tokens: {usage['output_tokens']}")
print(f" Total tokens: {usage['total_tokens']}")
bash
== Total token usage: ==
Input tokens: 20
Output tokens: 283
Total tokens: 303

== Detailed usage for each LLM call: ==
Completions:
Input tokens: 20
Output tokens: 283
Total tokens: 303

Disable auto-tracing

Auto tracing for FireworksAI can be disabled globally by calling mlflow.openai.autolog(disable=True) or mlflow.autolog(disable=True).