Skip to main content

Tracing Ollama

MLflow Tracing provides automatic tracing capability for Ollama models through the OpenAI SDK integration. Because Ollama exposes an OpenAI-compatible API, you can simply use mlflow.openai.autolog() to trace Ollama calls.

Ollama Tracing via autolog

MLflow trace automatically captures the following information about Ollama calls:

  • Prompts and completion responses
  • Latencies
  • Token usage
  • Model name
  • Additional metadata such as temperature, max_tokens, if specified.
  • Function calling if returned in the response
  • Any exception if raised
  • and more...

Getting Started

1

Install dependencies

bash
pip install mlflow openai
2

Start MLflow server

If you have a local Python environment >= 3.10, you can start the MLflow server locally using the mlflow CLI command.

bash
mlflow server
3

Run Ollama server

Ensure your Ollama server is running and the model you want to use is pulled.

bash
ollama run llama3.2:1b
4

Enable tracing and call Ollama

python
import mlflow
from openai import OpenAI

# Enable auto-tracing for OpenAI (works with Ollama)
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Ollama")

# Initialize the OpenAI client with Ollama API endpoint
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="dummy",
)

response = client.chat.completions.create(
model="llama3.2:1b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Why is the sky blue?"},
],
temperature=0.1,
max_tokens=100,
)
5

View traces in MLflow UI

Browse to your MLflow UI (for example, http://localhost:5000) and open the Ollama experiment to see traces for the calls above.

Ollama Tracing

→ View Next Steps for learning about more MLflow features like user feedback tracking, prompt management, and evaluation.

Supported APIs

MLflow supports automatic tracing for the following Ollama APIs through the OpenAI integration:

Chat CompletionFunction CallingStreamingAsync
✅ (*1)✅ (*2)

(*1) Streaming support requires MLflow 2.15.0 or later. (*2) Async support requires MLflow 2.21.0 or later.

To request support for additional APIs, please open a feature request on GitHub.

Streaming and Async Support

MLflow supports tracing for streaming and async Ollama APIs. Visit the OpenAI Tracing documentation for example code snippets for tracing streaming and async calls through OpenAI SDK.

Combine with Manual Tracing

To control the tracing behavior more precisely, MLflow provides Manual Tracing SDK to create spans for your custom code. Manual tracing can be used in conjunction with auto-tracing to create a custom trace while keeping the auto-tracing convenience. For more details, please refer to the Combine with Manual Tracing section in the OpenAI Tracing documentation.

Token usage

MLflow >= 3.2.0 supports token usage tracking for Ollama models through the OpenAI SDK integration. The token usage for each LLM call will be logged in the mlflow.chat.tokenUsage attribute. The total token usage throughout the trace will be available in the token_usage field of the trace info object. See the Token Usage documentation for more details.

Disable auto-tracing

Auto tracing for Ollama (through OpenAI SDK) can be disabled globally by calling mlflow.openai.autolog(disable=True) or mlflow.autolog(disable=True).

Next steps