Structured Output with Tool Calling 🛠️

While the industry rapidly adopted structured output capabilities, something even more powerful emerged alongside it: tool calling. This evolution represents a fundamental shift from simple text generation to AI systems that can interact with the real world.

Tool calling (also known as function calling) enables Large Language Models to interact with external systems, APIs, databases, and tools by invoking predefined functions during conversations. Rather than limiting AI to text responses, models can execute code, retrieve real-time data, and perform actions across your entire infrastructure stack.

Some LLMs, particularly Anthropic's Claude family of foundation models, bypassed structured output entirely and exclusively offer tool calling capabilities. Understanding these mechanisms becomes critical since, for certain model families, tool calling represents the only path to predictable, parseable data that integrates reliably with your data pipelines.

Getting Started with the Foundation 🏗️

Environment preparation follows our established patterns. We'll leverage the Databricks SDK with OpenAI compatibility for that familiar developer experience while maintaining enterprise-grade security and governance.

%pip install databricks-sdk[openai]

import json
from IPython.display import Markdown
from databricks.sdk import WorkspaceClient
from pydantic import BaseModel, Field

# Initialize our workspace client - this is our gateway to Databricks goodness
ws = WorkspaceClient()
oai = ws.serving_endpoints.get_open_ai_client()

The Magic Question: Can We Use Pydantic BaseModels as Tools? 🤔

Spoiler Alert: Absolutely We Can! 🎉

This is where architectural elegance meets practical implementation. Tool calling parameters are defined using JSON Schema specifications—and Pydantic BaseModels generate JSON Schema natively. This alignment creates what we call "elegant programmer efficiency" at its finest.

The beauty lies in the seamless integration between these technologies. Pydantic handles schema generation, validation, and type safety while tool calling provides the execution framework. This combination eliminates the traditional disconnect between schema definition and runtime validation that plagues many AI integration projects.

Here's the practical implementation:

class ModelResponse(BaseModel):
    response: str

# This model supports tools but doesn't natively support structured output
model = 'databricks-claude-sonnet-4'

# Here's the magic - we're using Pydantic's schema generation as our tool definition
str_output_tool = {
    'type': 'function',
    'function': {
        'name': 'str_output_tool',
        'description': 'Formats the models textual response into a structured output',
        'parameters': ModelResponse.model_json_schema()  # The secret sauce! 🔥
    }
}

ai_response = oai.chat.completions.create(
    model=model,
    tools=[str_output_tool],
    messages=[
        {'role': 'user', 'content': 
            'What is the meaning of life? '
            'Beautify your response in Markdown with cute emojis. '
            'Keep it to 3 sections with 2 bullets or less. '
            'Prefer conversational text.'
        }
    ]
)

# Handle the tool call response
if ai_response.choices[0].finish_reason == 'tool_calls':
    for tool_call in ai_response.choices[0].message.tool_calls:
        if tool_call.function.name == 'str_output_tool':
            response = ModelResponse.model_validate_json(tool_call.function.arguments)

print(f"Response type: {type(response)}")
display(Markdown(response.response))

The model receives our tool definition (essentially our Pydantic schema), understands the required response structure, and returns JSON that perfectly matches our expected format. This approach provides the flexibility of natural language interaction with the reliability of structured data contracts.

Cranking Up the Consistency 🎯

While the basic implementation works, production environments demand enhanced reliability and predictability. Let's implement three critical improvements that transform this from a proof-of-concept into a production-ready solution:

1. Forcing Tool Choice 🎲

Deterministic behavior requires explicit control. Since we have one tool and always want the model to use it, we eliminate ambiguity through forced tool selection.

2. Enhanced Prompting 📝

Clear instructions directly correlate with better results. Explicit guidance reduces model uncertainty and improves consistency across different inputs and contexts.

3. Rich Field Descriptions with Pydantic Field 📋

Pydantic's field description capabilities provide detailed context that helps models understand exactly what each parameter represents:

class ModelResponse(BaseModel):
    # Adding rich descriptions helps the model understand exactly what we want
    response: str = Field(
        default='',  # Always provide defaults - it's a good habit!
        description='The models response formatted according to user requirements'
    )

model = 'databricks-claude-sonnet-4'

str_output_tool = {
    'type': 'function',
    'function': {
        'name': 'str_output_tool',
        'description': 'Formats the models textual response into a structured output',
        'parameters': ModelResponse.model_json_schema()
    }
}

ai_response = oai.chat.completions.create(
    model=model,
    tools=[str_output_tool],
    # 🎯 Forcing the tool choice - no more guessing games!
    tool_choice={
        'type': 'function', 
        'function': {'name': 'str_output_tool'}
    },
    messages=[
        {'role': 'user', 'content': 
            'What is the meaning of life? '
            'Beautify your response in Markdown with cute emojis. '
            'Keep it to 3 sections with 2 bullets or less. '
            'Prefer conversational text.\n\n'
            # 📝 Crystal clear instructions
            'When responding, always use the following tool:\n'
            '- **str_output_tool**: formats model response into a structured output.\n'
            '  - Use the models response as the response parameter.'
        }
    ]
)

# Now we can confidently assume the tool call structure
response = ModelResponse.model_validate_json(
    ai_response.choices[0].message.tool_calls[0].function.arguments
)
print(f"Response type: {type(response)}")
display(Markdown(response.response))

Notice the cleaner implementation—we no longer check for tool call existence or iterate through responses. We know exactly what structure we're receiving, enabling more robust error handling and cleaner application logic.

Bringing Tool Calling to Databricks SQL 🚀

While Databricks doesn't provide native tool calling support in AI_Query (it returns only the model's direct content), we can implement this functionality using PySpark UDFs combined with the OpenAI client. This approach maintains SQL simplicity while adding tool calling capabilities.

Authentication and configuration setup:

from openai import OpenAI
from databricks.sdk import WorkspaceClient

ws = WorkspaceClient()
# Store your PAT token securely in Databricks secrets
# ws.secrets.put_secret('auth', 'db-pat', string_value='your-token-here')

FMAPI_BASE_URL = f'https://{spark.conf.get("spark.databricks.workspaceUrl")}/serving-endpoints'
api_key = dbutils.secrets.get('auth', 'db-pat')

We replicate our established logic but package it for SQL consumption:

# Testing our approach functionally first
oai_ = OpenAI(
    base_url=FMAPI_BASE_URL,
    api_key=api_key
)

ai_response = oai_.chat.completions.create(
    model=model,
    tools=[str_output_tool],
    tool_choice={
        'type': 'function', 
        'function': {'name': 'str_output_tool'}
    },
    messages=[
        {'role': 'user', 'content': 
            'What is the meaning of life? '
            'Beautify your response in Markdown with cute emojis. '
            'Keep it to 3 sections with 2 bullets or less. '
            'Prefer conversational text.\n\n'
            'When responding, always use the following tool:\n'
            '- **str_output_tool**: formats model response into a structured output.\n'
            '  - Use the models response as the response parameter.'
        }
    ]
)

response = ModelResponse.model_validate_json(
    ai_response.choices[0].message.tool_calls[0].function.arguments
)
print(f"Response type: {type(response)}")
display(Markdown(response.response))

Building Our PySpark UDF 🏗️

Creating a UDF that brings this functionality to SQL requires dynamic schema generation. This approach eliminates manual schema definition while ensuring type safety:

import pandas as pd
from pyspark.sql.functions import pandas_udf, col, schema_of_json, from_json, lit
from pyspark.sql.types import StringType, StructType
from typing import Iterator

# Configuration
FMAPI_BASE_URL = f'https://{spark.conf.get("spark.databricks.workspaceUrl")}/serving-endpoints'
FMAPI_API_KEY = dbutils.secrets.get('auth', 'db-pat')
model = 'databricks-claude-sonnet-4'

class ModelResponse(BaseModel):
    response: str = Field(
        default='', 
        description='The models response formatted according to requirements'
    )

str_output_tool = {
    'type': 'function',
    'function': {
        'name': 'str_output_tool',
        'description': 'Formats the models textual response into a structured output',
        'parameters': ModelResponse.model_json_schema()
    }
}

# 🎯 Dynamic schema generation - this is the cool part!
model_preview_json = ModelResponse.model_validate({}).model_dump_json()
model_return_type = spark.sql(
    f"SELECT SCHEMA_OF_JSON('{model_preview_json}') AS return_type"
).collect()[0]['return_type']
pyspark_schema = StructType.fromDDL(model_return_type)
print(f"Generated schema: {pyspark_schema}")

The UDF implementation starts with string return types for simplicity, then parses back to structured data:

@pandas_udf(StringType())
def tc_ai_query(prompts: pd.Series) -> pd.Series:
    """
    Tool-calling AI query UDF that returns structured responses.

    This UDF takes a series of prompts and returns structured JSON responses
    using the tool calling mechanism we defined above.
    """
    from openai import OpenAI

    # Initialize the OpenAI client within the UDF context
    oai_ = OpenAI(
        base_url=FMAPI_BASE_URL,
        api_key=FMAPI_API_KEY
    )

    def get_response(prompt: str) -> str:
        """Process a single prompt and return the structured response as JSON string."""
        try:
            ai_response = oai_.chat.completions.create(
                model=model,
                tools=[str_output_tool],
                tool_choice={
                    'type': 'function', 
                    'function': {'name': 'str_output_tool'}
                },
                messages=[
                    {'role': 'user', 'content': 
                        f'{prompt}\n\n'
                        'When responding, always use the following tool:\n'
                        '- **str_output_tool**: formats model response into a structured output.\n'
                        '  - Use the models response as the response parameter.'
                    }
                ]
            )
            return ai_response.choices[0].message.tool_calls[0].function.arguments
        except Exception as e:
            # Return a valid JSON structure even on error
            return json.dumps({"response": f"Error: {str(e)}"})

    return prompts.apply(get_response)

Putting It All Together 🎊

Testing our implementation with real data demonstrates practical usage:

# Create a test DataFrame
test_data = pd.DataFrame([
    {'prompts': 'What is the meaning of life?'},
    {'prompts': 'What is Databricks?'},
    {'prompts': 'What type of coffee is the most popular globally?'}
])

sdf = spark.createDataFrame(test_data)

# Apply our UDF and parse the results
result_df = sdf.select(
    col('prompts'),
    from_json(
        tc_ai_query(col('prompts')), 
        schema_of_json(lit(model_preview_json))
    ).alias('ai_response')
)

display(result_df)

What We've Accomplished 🏆

This approach delivers several powerful capabilities that distinguish it from simpler AI integration patterns:

🎯 Consistency: Every response conforms to identical structures, making downstream processing completely predictable. Unlike traditional AI integrations that require complex parsing logic, this approach guarantees data contract compliance.

🔧 Type Safety: Pydantic integration ensures data models are validated and typed correctly at runtime. This validation prevents the data quality issues that often plague AI-powered pipelines in production environments.

⚡ SQL Integration: Structured AI responses integrate directly into SQL workflows without requiring custom transformation logic. This capability enables data analysts and engineers to leverage AI capabilities using familiar tools and patterns.

🛡️ Error Handling: The UDF implementation gracefully handles errors while maintaining expected output structures. This resilience prevents cascade failures that can destabilize downstream data processing jobs.

📈 Scalability: DataFrame-scale processing handles multiple prompts in parallel, leveraging Databricks' distributed computing capabilities. This scalability enables AI integration at enterprise data volumes without performance degradation.

The architectural beauty lies in bridging the gap between tool calling flexibility and data engineering structure requirements. You achieve conversational AI capabilities while maintaining the predictable, structured outputs that enable reliable data workflows. This combination represents a significant advancement over traditional approaches that force you to choose between AI flexibility and data reliability.

References & Further Reading 📚

Now go forth and build amazing structured AI workflows that combine the power of tool calling with the reliability your production systems demand! 🚀