Skip to main content

Tracing FireworksAI

FireworksAI Tracing via autolog

FireworksAI is an inference and customization engine for open source AI. It provides day zero access to the latest SOTA OSS models and allows developers to build lightning AI applications.

MLflow Tracing provides automatic tracing capability for FireworksAI through the OpenAI SDK compatibility. FireworksAI is OpenAI SDK compatible, you can use the mlflow.openai.autolog() function to enable automatic tracing. MLflow will capture traces for LLM invocations and log them to the active MLflow Experiment.

MLflow automatically captures the following information about FireworksAI calls:

  • Prompts and completion responses
  • Latencies
  • Model name
  • Additional metadata such as temperature, max_completion_tokens, if specified
  • Tool Use if returned in the response
  • Any exception if raised

Getting Started

1

Install Dependencies

bash
pip install mlflow openai
2

Start MLflow Server

If you have a local Python environment >= 3.10, you can start the MLflow server locally using the mlflow CLI command.

bash
mlflow server
3

Enable Tracing and Make API Calls

Enable tracing with mlflow.openai.autolog() and make API calls as usual.

python
import mlflow
import openai
import os

# Enable auto-tracing for FireworksAI (uses OpenAI SDK compatibility)
mlflow.openai.autolog()

# Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("FireworksAI")

# Create an OpenAI client configured for FireworksAI
client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)

# Use the client as usual - traces will be automatically captured
response = client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324", # For other models see: https://fireworks.ai/models
messages=[
{"role": "user", "content": "Why is open source better than closed source?"}
],
)
4

View Traces in MLflow UI

Browse to the MLflow UI at http://localhost:5000 (or your MLflow server URL) and you should see the traces for the FireworksAI API calls.

FireworksAI Tracing in MLflow UI

→ View Next Steps for learning about more MLflow features like user feedback tracking, prompt management, and evaluation.

Supported APIs

Since FireworksAI is OpenAI SDK compatible, all APIs supported by MLflow's OpenAI integration work seamlessly with FireworksAI. See the model library for a list of available models on FireworksAI.

NormalTool UseStructured OutputsStreamingAsync

Chat Completion API Examples

python
import openai
import mlflow
import os

# Enable auto-tracing
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
# If running locally you can start a server with: `mlflow server --host 127.0.0.1 --port 5000`
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("FireworksAI")

# Configure OpenAI client for FireworksAI
openai_client = openai.OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key=os.getenv("FIREWORKS_API_KEY"),
)

messages = [
{
"role": "user",
"content": "What is the capital of France?",
}
]

# To use different models check out the model library at: https://fireworks.ai/models
response = openai_client.chat.completions.create(
model="accounts/fireworks/models/deepseek-v3-0324",
messages=messages,
max_completion_tokens=100,
)

Token Usage

MLflow supports token usage tracking for FireworksAI. The token usage for each LLM call will be logged in the mlflow.chat.tokenUsage attribute. The total token usage throughout the trace will be available in the token_usage field of the trace info object.

python
import json
import mlflow

mlflow.openai.autolog()

# Run the tool calling agent defined in the previous section
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)

# Get the trace object just created
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f" Input tokens: {total_usage['input_tokens']}")
print(f" Output tokens: {total_usage['output_tokens']}")
print(f" Total tokens: {total_usage['total_tokens']}")

# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
if usage := span.get_attribute("mlflow.chat.tokenUsage"):
print(f"{span.name}:")
print(f" Input tokens: {usage['input_tokens']}")
print(f" Output tokens: {usage['output_tokens']}")
print(f" Total tokens: {usage['total_tokens']}")
bash
== Total token usage: ==
Input tokens: 20
Output tokens: 283
Total tokens: 303

== Detailed usage for each LLM call: ==
Completions:
Input tokens: 20
Output tokens: 283
Total tokens: 303

Disable auto-tracing

Auto tracing for FireworksAI can be disabled globally by calling mlflow.openai.autolog(disable=True) or mlflow.autolog(disable=True).

Next steps