Building Tools for Agents

I’ve been building an email tool for agents and came across some interesting (at least to me) design considerations, so I’m sharing them here. I’m sure all of these will be less relevant or obsolete as models continue to improve, and some of these might be bad ideas right now.

OpenAI tool spec for the botmailroom_send_email function

 {
    "type": "function",
    "function": {
        "name": "botmailroom_send_email",
        "description": "Send an email",
        "parameters": {
            "type": "object",
            "properties": {
                "from_address": {
                    "type": "string",
                    "description": "The address to send the email from. Must be a valid address"
                },
                "subject": {
                    "type": "string",
                    "description": "The subject of the email. Must be provided unless replying or forwarding an existing email, i.e. existing_email_id must be provided if this is not provided"
                },
                "to_addresses": {
                    "type": "array",
                    "description": "The list of email addresses to send the email to",
                    "items": {
                        "type": "string"
                    }
                },
                "content": {
                    "type": "string",
                    "description": "The content of the email. Can be in plain text or HTML. If HTML, it will be rendered in the email client, so be sure to only use html that is compatible with email clients"
                },
                "message_type": {
                    "type": "string",
                    "description": "The type of message to send: 'new' for a new message, 'reply' for replying to an existing email, or 'forward' for forwarding an existing email. If `existing_email_id` is provided, `message_type` must be `reply` or `forward`",
                    "enum": [
                        "new",
                        "reply",
                        "forward"
                    ]
                },
                "existing_email_id": {
                    "type": "string",
                    "description": "The ID of the email to reply to or forward. Only used if message_type is 'reply' or 'forward'. The ID should be a valid UUID"
                },
                "attachment_ids": {
                    "type": "array",
                    "description": "The IDs of the attachments to include in the email. Attachment must be added to the attachments pool first using the add_attachment_to_pool function. Each ID should be a valid UUID.",
                    "items": {
                        "type": "string"
                    }
                }
            },
            "required": [
                "from_address",
                "to_addresses",
                "content",
                "message_type"
            ]
        }
    }
}

Parameters and Default Values

With any system there are always tradeoffs between complexity and flexibility. In API design, one strategy for dealing with this tradeoff is to provide reasonable defaults for as many parameters as possible, and then allow the user to specify a custom value if they need to, but LLMs force this idea to the extreme. Unlike a developer who can slowly incorporate more and more parameters over time as needed into deterministic code, the LLM has to functionally start from scratch with every request. If you provide a tool spec with a lot of parameters that are not required, in my experience the LLM will often make mistakes, particulary when the validity of one parameter is conditional on the value of another parameter (e.g. setting message_type to new and providing an existing_email_id in the above tool spec).

One strategy is to catch errors in your client before even submitting a request to the server and then passing the details of the error to the LLM to fix the tool call - note that this can be costly from a token and latency perspective.

try:
    output = getattr(self, tool_name)(**tool_args)
except BotMailRoomValidationError as e:
    if catch_validation_errors:
        return str(e)
    else:
        raise e

Another strategy is to consolidate parameters and then use client code to handle some of the additional complexity. For example, the send email endpoint takes in both html and plain_text content, but the tool spec only has a single content parameter. The client code can then take the content string and figure out if it is HTML or plain text, and then pass the appropriate value to the API.

You can also just hide a lot of the available API parameters from the tool spec all together, constraining what the LLM can do with your system in exchange for additional reliability. The send email endpoint has 14 possible parameters, but the tool spec only has 7.

Chaining API calls

Many APIs are designed in a way that relies on an initial call to the find the relevant identifier for a resource and then another call to conduct actions on that resource. Similar to the parameter number problem, while this may be ok for a developer who can incorporate these chained calls in a deterministic way, the LLM has to start from scratch with every request. Increasing the number of tool calls that the LLM has to make increases the chances that one those tool calls will have an error that will cause the workflow to fail. In the send email endpoint, I originally required the caller to pass in an inbox_id to specify which inbox to send the email from. Even though the description of the inbox_id parameter explicitly stated that it needed to be a valid UUID, the LLM would constantly pass in an email address into that parameter in the tool call instead of first querying the get inboxes endpoint to get the inbox_id. Changing the parameter to from_address solved this problem entirely (since there is a uniqueness constraint on email_address for inboxes, determining the relevant inbox on the backend was trivial). While this may have been just a case of bad API design in the first place, the increase in tool call reliability was so immediate that I thought it was worth calling out.

Simplify the Developer Workflow

A developer can write their own spec for the tool call and the logic necessary to call the tool, but providing an easy way to handle both using your client code reduces the friction in getting started and enables you to mitigate some of the above problems. Stripe did this recently with their own agent tool kit and the BotMailRoom clients offer a similar experience.

# get the tool specs
tools = botmailroom_client.get_tools(
    tools_to_include=["botmailroom_send_email"]
)

output = await openai_client.chat.completions.create(
    model="gpt-4o",
    messages=chat,
    tools=tools,
)

if output.choices[0].message.tool_calls:
    for tool_call in output.choices[0].message.tool_calls:

        if tool_call.function.name.startswith("botmailroom_"):

            # execute tool call
            tool_output = botmailroom_client.execute_tool(
                tool_call.function.name,
                arguments,
                enforce_str_output=True,
            )

Some additional notes:

Not all tools are necessary for each workflow, allowing an easy way to filter tool specs for the workflow can help reduce the complexity the LLM has to handle - see tools_to_include parameter in the above code.
The enforce_str_output parameter ensures that the tool output is a string, which makes it easy to include in the LLM chat history - you do want to allow the developer to use the object output though if the some other part of the workflow needs to use the object output.
If you’re using tools from different providers, prefixing the tool call name with your service name (e.g. botmailroom_) makes it easy to quickly use the right client code to execute the tool call.