Tracing OpenAI Agent Responses using MLFlow

Spread the love

Mlflow is an open-source platform for managing and tracking of machine learning experiments. When Openai agents are used with SDK, Mlflow automatically:

Logs all agent interactions and API calls

Captures tool uses, input/output messages and intermediate decisions
Tracks go on for debugging, performance analysis and fertility

This is especially useful when you are building multi-agent systems where various agents cooperate dynamically or call tasks

In this tutorial, we will undergo two prime examples: a simple handoff between agents, and the use of agent guardril – detecting their behavior using all mlflows.

Table of Contents

Set up

Library

pip install openai-agents mlflow pydantic pydotenv

Openai api key

To get Openai API key, go to https://platform.openai.com/Settings/organization/api- keys and generate a new key. If you are a new user, you may need to add billing details to activate API access and pay a minimum of $ 5.

Once the key is generated, create a .NV file and enter the following:

OPENAI_API_KEY = <YOUR_API_KEY>

With the key you generated Change

Multi-agent system (multi_agent_demo.py)

In this script (multi_agent_demo.py), we create a simple multi-agent assistant using Openai agents SDK, designed to root the user query for a cooking specialist or cooking specialist. We enable mlflow.Openai.autologist ()Which automatically detects and logs all agent interactions with OpenAI API – including input, output and agent handoff – makes the system easy to monitor and debug. Mlflow has been configured to use a local file-based tracking URI (./mlruns) And the experiment logs all the activity under the name “Agent – Coding Ooking Couking,

import mlflow, asyncio
from agents import Agent, Runner
import os
from dotenv import load_dotenv
load_dotenv()

mlflow.openai.autolog()                           # Auto‑trace every OpenAI call
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Coding‑Cooking")

coding_agent = Agent(name="Coding agent",
                     instructions="You only answer coding questions.")

cooking_agent = Agent(name="Cooking agent",
                      instructions="You only answer cooking questions.")

triage_agent = Agent(
    name="Triage agent",
    instructions="If the request is about code, handoff to coding_agent; "
                 "if about cooking, handoff to cooking_agent.",
    handoffs=[coding_agent, cooking_agent],
)

async def main():
    res = await Runner.run(triage_agent,
                           input="How do I boil pasta al dente?")
    print(res.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Mlflow ui

To open Mlflow UI and see all log agent interactions, run the following command in a new terminal:

This Mlflow Tracking will start the server and display a sign indicating the URL and port where the UI is accessible – usually http: // localhost: 5000 As a default.

We can see the entire interaction flow Trace Section – How the assistant from the initial input of the user rooted the request to the appropriate agent, and finally, the response generated by that agent. This end-to-end trace provides valuable insight into decision making, handoffs and outputs, which helps you to debug and customize your agent workflows.

Tracing guards (railings)

In this example, we apply a rail-protected customer aid agent using Openai agents SDK with Mlflow Tresing. The agent is designed to help users with general questions, but is banned from answering therapy related questions. A dedicated railing agent checks for such input, and if detected, blocks the request. Mlflow catchs the entire flow – including railing activation, logic and agent response – providing complete traceability and insight into the security mechanisms.

import mlflow, asyncio
from pydantic import BaseModel
from agents import (
    Agent, Runner,
    GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
    input_guardrail, RunContextWrapper)

from dotenv import load_dotenv
load_dotenv()

mlflow.openai.autolog()
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Guardrails")

class MedicalSymptons(BaseModel):
    medical_symptoms: bool
    reasoning: str


guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking you for medical symptons.",
    output_type=MedicalSymptons,
)


@input_guardrail
async def medical_guardrail(
    ctx: RunContextWrapper[None], agent: Agent, input
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.medical_symptoms,
    )


agent = Agent(
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    input_guardrails=[medical_guardrail],
)


async def main():
    try:
        await Runner.run(agent, "Should I take aspirin if I'm having a headache?")
        print("Guardrail didn't trip - this is unexpected")

    except InputGuardrailTripwireTriggered:
        print("Medical guardrail tripped")


if __name__ == "__main__":
    asyncio.run(main())

This script defines a customer aid agent with an input railing that detects medical related questions. It uses a separate railing_Angent to evaluate whether the user’s input is requested for medical advice. If such input is detected, the guardril triggers and prevents the main agent from responding. The entire process, including guardril checks and results, is automatically logged in and detected using Mlflow.

Mlflow ui

To open Mlflow UI and see all log agent interactions, run the following command in a new terminal:

In this example, we asked the agent, “If I am having a headache, should I take aspirin?”, Who triggers the railing. In MLFLOW UI, we can clearly see why the input was flagged, as well as the request with the argument provided by the guardril agent.

See code. All credit for this research goes to the researchers of this project. 1 million+ AI are ready to join with Devas/Engineers/Researchers? See that Nvidia, LG AI Research, and top AI companies take advantage of Marketpost to reach their target audiences [Learn More]

I am a Civil Engineering Bachelor of Civil Engineering (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in data science, especially nerve networks and their application in various fields.

Source link