🤖 Generative AI with Databricks: Structured Output Mastery

Building Reliable AI Applications with Schema-Validated Responses

🎯 What We're Solving Here

Picture this: you're building a production AI system, and your LLM keeps returning unpredictable responses. Sometimes it's valid JSON, sometimes it's wrapped in markdown, sometimes it's just plain text with creative formatting. Sound familiar? 😅

Structured output changes the game entirely. Instead of crossing your fingers and hoping for parseable responses, you define exactly what you want upfront using schemas. Think of it as a contract between you and the AI model—one that eliminates the guesswork and delivers consistent, production-ready data every time.

🔑 Core Concepts & Real-World Impact

Structured Output Definition: AI responses that conform to predefined schemas (JSON, XML, or custom formats) with specific fields, types, and validation rules.

Why This Matters:

Data pipeline reliability becomes non-negotiable when your ETL jobs depend on AI responses. Without structured output, a single malformed response can cascade into downstream failures, alerting systems, and unhappy stakeholders. With schema validation, your pipelines run predictably, maintaining SLA commitments and operational stability.

Type safety transforms your development experience completely. Your IDE knows exactly what fields to expect, providing intelligent autocomplete and catching errors at compile time rather than runtime. This isn't just about convenience—it's about building robust applications that fail fast and fail clearly during development rather than in production.

Downstream integration becomes seamless when APIs, databases, and analytics tools receive consistent data formats. No more custom parsing logic for each consumer, no more "works in my environment" debugging sessions. This consistency enables true microservices architecture where components can trust their data contracts implicitly.

Real Example: Instead of getting "The sentiment is probably positive with about 85% confidence", you get:

{
  "sentiment": "positive",
  "confidence": 0.85,
  "reasoning": "Multiple positive indicators detected"
}

🚀 The Databricks Challenge (And Our Solution)

Here's the thing that'll save you hours of frustration: OpenAI's beta.chat.completions.parse doesn't play nicely with Databricks' AI Gateway. Why? Because Databricks abstracts response formats to work with multiple LLM providers—some use JSON Schema, others have proprietary formats.

But don't worry! With some clever engineering, we can recreate that same smooth developer experience while leveraging all of Databricks' optimizations. The approach we'll show you maintains the simplicity of OpenAI's structured output while providing the enterprise security, cost control, and governance that Databricks delivers. 🎉

⚙️ Model Compatibility Matrix

✅ Fully Supported: OpenAI (o4-mini, GPT-4), Google (Gemini-2.5-pro), Most Llama models
❌ Not Supported: Anthropic models (Claude Opus/Sonnet) - use prompting strategies and/or with tool calling when you can.

Structured Output with Tool Calling

Tool or Function calling came after structured output was released. The next section called Structured Output with Tool Calling explores this in depth.

🛠️ Implementation Deep Dive

Setting Up Your Environment

%pip install databricks-sdk[openai]

import json
from IPython.display import Markdown
from databricks.sdk import WorkspaceClient
from pydantic import BaseModel, Field

# Initialize Databricks workspace client
ws = WorkspaceClient()
oai = ws.serving_endpoints.get_open_ai_client()

The Magic Helper Functions

These utility functions bridge the gap between Pydantic models and Databricks' response format requirements:

def to_response_format(model: BaseModel, name: str = 'response'):
    """Convert Pydantic model to OpenAI-compatible response format"""
    return {
        'type': 'json_schema',
        'json_schema': {
            'name': name,
            'schema': model.model_json_schema()
        }
    }

def preview_response_format(response_format: dict):
    """Generate zero-valued preview for SQL schema inference"""
    zeros = {
        "string": "",
        "number": 0.0,
        "integer": 0,
        "boolean": False,
        "array": [],
        "object": {},
        "null": None
    }

    schema = response_format["json_schema"]["schema"]

    def build(node):
        if not isinstance(node, dict):
            return ""

        node_type = node.get("type", "string")        
        if node_type == "object" and "properties" in node:
            return {k: build(v) for k, v in node["properties"].items()}
        if node_type == "array" and "items" in node:
            return [build(node["items"])]

        return zeros.get(node_type, "")

    return build(schema)

The preview_response_format function is particularly clever—it generates zero-valued examples that Spark can use for schema inference, eliminating the manual process of defining SQL column types. This automation prevents schema mismatches that typically plague AI integration projects.

Your First Structured Response

class ModelResponse(BaseModel):
    response: str

# Generate the response format
response_format = to_response_format(ModelResponse)
response_format_preview = preview_response_format(response_format)

ai_response = oai.chat.completions.create(
    model='databricks-meta-llama-3-3-70b-instruct',
    messages=[
        {'role': 'system', 'content': 'Ensure special characters are properly escaped as per JSON specifications.'},
        {'role': 'user', 'content': 
            'Why is Databricks the Data Intelligence Platform? Return this in Markdown format with emojis.'}
    ],
    response_format=response_format
)

try:
    mdl_response = ModelResponse.model_validate_json(ai_response.choices[0].message.content)
    display(Markdown(mdl_response.response))
except Exception as e:
    print(f"Validation error: {e}")

Tip

Always include system prompts about JSON escaping—it prevents malformed responses that break your parsing! This simple addition eliminates a common source of production failures. 🔧

🗄️ SQL Integration: Making It Work with AI_QUERY

The real power comes when you integrate structured output with Databricks SQL. Here's how to make AI_QUERY work seamlessly with your schemas:

SELECT FROM_JSON(
    AI_QUERY(
        'openai-o4-mini',
        'Explain why Databricks is the Data Intelligence Platform in 3 parts.',
        responseFormat => '{"type": "json_schema", "json_schema": {"name": "response", "schema": {"properties": {"response": {"title": "Response", "type": "string"}}, "required": ["response"], "title": "ModelResponse", "type": "object"}}}'
    ), 
    SCHEMA_OF_JSON('{"response": ""}')
) AS ai_response

The Foldable Function Challenge 🤔

Here's a crucial technical detail that'll save you debugging time: AI_QUERY is a foldable function, meaning Spark evaluates it at compile time. This architectural choice has important implications:

❌ You cannot dynamically generate response formats within the same query
✅ You can use Python to generate the SQL strings beforehand
✅ You can create reusable UDFs with predefined schemas

This constraint actually encourages better practices by forcing you to define your data contracts upfront rather than generating them dynamically. The result is more maintainable, testable, and predictable AI applications.

🏭 Production-Ready Solution Templates for Structured Outputs

Building Dynamic UDFs

Let's create a more sophisticated model and build reusable functions that scale across your organization:

class AdvancedModelResponse(BaseModel):
    response: str = ""  # Always define defaults!
    confidence: float = 0.0
    keywords: list[str] = [""]  # Single empty string for array schema

# Generate the SQL components
response_format = to_response_format(AdvancedModelResponse)
preview = preview_response_format(response_format)
json_preview = json.dumps(preview)

# Calculate the return type schema
return_type = spark.sql(f"SELECT SCHEMA_OF_JSON('{json_preview}') AS return_type").collect()[0]['return_type']

# Create the UDF
udf_query = f'''
CREATE OR REPLACE TEMPORARY FUNCTION STRUCTURED_AI_QUERY(
  request STRING
) RETURNS {return_type}
RETURN FROM_JSON(
    AI_QUERY(
        endpoint => 'openai-o4-mini',
        request => request,
        responseFormat => '{json.dumps(response_format)}'
    ), '{return_type}');
'''

spark.sql(udf_query)

Now you can use it cleanly in SQL:

SELECT STRUCTURED_AI_QUERY('Tell me about machine learning ethics') AS analysis

This pattern creates reusable, type-safe AI functions that your entire data team can leverage. Unlike ad-hoc AI integrations that require custom code for each use case, this approach provides a consistent interface that scales across projects and teams.

🎨 The StructuredModel Class: Your New Best Friend

This class eliminates all the boilerplate and gives you a clean, Pythonic interface that abstracts away the complexity while maintaining full control:

import json
from openai import OpenAI
from typing import Optional


class StructuredModel:

    def __init__(self, 
        endpoint: str, 
        response_format: BaseModel,
        client: Optional[OpenAI] = None
    ):
        self.endpoint = endpoint
        self.response_format = response_format
        if client:
            self.client = client
        else:
            from databricks.sdk import WorkspaceClient
            ws = WorkspaceClient()
            self.client = ws.serving_endpoints.get_open_ai_client()

    @staticmethod
    def to_response_format(model: BaseModel, name: str = 'response'):
        return {
            'type': 'json_schema',
            'json_schema': {
                'name': name,
                'schema': model.model_json_schema()
            }
        }

    def create(self, messages: list[dict]):
        response = self.client.chat.completions.create(
            model=self.endpoint,
            messages=messages,
            response_format=self.to_response_format(self.response_format)
        )
        return [
            self.response_format.model_validate_json(choice.message.content)
            for choice in response.choices
        ]

    def __call__(self, messages: list[dict]):
        return self.create(messages)

Real-World Usage Example

# Define your response structure
class ProductAnalysis(BaseModel):
    sentiment: str = "neutral"
    confidence: float = 0.0
    key_themes: list[str] = [""]
    actionable_insights: list[str] = [""]

# Create your AI interface
analyzer = StructuredModel(
    endpoint='openai-o4-mini',
    response_format=ProductAnalysis
)

# Get structured results
result = analyzer([{
    'role': 'user', 
    'content': 'Analyze this customer feedback: The new dashboard is intuitive but loading times are terrible!'
}])

print(json.dumps(result[0].model_dump(), indent=2))

Expected Output:

{
  "sentiment": "mixed",
  "confidence": 0.85,
  "key_themes": ["user_interface", "performance_issues", "usability"],
  "actionable_insights": [
    "Investigate and optimize dashboard loading performance",
    "Maintain current UI design approach",
    "Consider adding loading indicators for better UX"
  ]
}

🏗️ The UDF Factory Pattern: Scale with Confidence

For enterprise environments, the StructuredAIUDFFactory provides a scalable way to create type-safe AI functions that can be shared across teams and projects:

import warnings
from pyspark.sql import SparkSession
from typing import Optional


class StructuredAIUDFFactory:

    def __init__(self,
        endpoint: str,
        response_format: BaseModel,
        session: Optional[SparkSession] = None,
        function_name: str = 'STR_AI_QUERY'
    ):
        warnings.warn('Always define zero-valued defaults for your BaseModels')
        self.endpoint = endpoint
        self.response_format = response_format
        self.spark = session or spark
        self.function_name = function_name

        try:
            self.response_format.model_validate({})
        except Exception as e:
            warnings.warn(
                'Failed response_format model validation; did you define ' 
                'default zero values for fields in your BaseModel?'
            )
            raise e

    @staticmethod
    def to_response_format(model: BaseModel, name: str = 'response'):
        return {
            'type': 'json_schema',
            'json_schema': {
                'name': name,
                'schema': model.model_json_schema()
            }
        }

    @property
    def query(self):
        response_format_json = json.dumps(
            self.to_response_format(self.response_format)
        )
        model_preview_json = json.dumps(
            model_preview := self.response_format.model_validate({}).model_dump()
        )
        model_return_type = self.spark.sql(f"SELECT SCHEMA_OF_JSON('{model_preview_json}') AS return_type").collect()[0]['return_type']
        return f'''
        CREATE OR REPLACE TEMPORARY FUNCTION {self.function_name}(
        request STRING
        ) RETURNS {model_return_type}
        RETURN FROM_JSON(
            AI_QUERY(
                endpoint => '{self.endpoint}',
                request => request,
                responseFormat => '{response_format_json}'
            ), '{model_return_type}');
        '''

    def create(self):
        try:
            self.spark.sql(self.query)
            warnings.warn(f'UDF: {self.function_name} only exists during this session!')

            print(f'successfully created temporary UDF: {self.function_name}')
        except Exception as e:
            print(f'failed to create temporary UDF: {self.function_name}')

            raise e

Usage:

# Create a specialized sentiment analysis UDF
sentiment_udf = StructuredAIUDFFactory(
    endpoint="databricks-meta-llama-3-1-405b-instruct",
    response_format=ProductAnalysis,
    function_name="ANALYZE_SENTIMENT"
)
sentiment_udf.create()

This factory pattern enables governance at scale by standardizing how AI functions are created and deployed. Teams can create specialized functions for their domain while maintaining consistency in implementation patterns and security controls. The result is a library of reusable AI capabilities that can be discovered and leveraged across your organization.

🎯 Key Takeaways & Best Practices

Do This:

✅ Always define default zero values in your Pydantic models—this ensures your schemas work correctly with Spark's type inference system. Without defaults, SQL integration becomes significantly more complex and error-prone.
✅ Use system prompts to enforce JSON formatting consistently across all your AI interactions. This simple practice prevents the majority of parsing failures that plague production AI systems.
✅ Validate your schemas before deploying to production environments. Create comprehensive test cases that cover edge cases, ensuring your structured output patterns work reliably under various conditions.
✅ Leverage the factory pattern for reusable AI functions rather than creating one-off implementations. This approach promotes code reuse, simplifies maintenance, and enables better governance across your AI applications.

Avoid This:

❌ Trying to use dynamic schemas with foldable functions will lead to runtime errors and frustrating debugging sessions. Databricks' compilation model requires static schema definitions.
❌ Forgetting to handle validation errors gracefully can cause cascade failures in production pipelines. Always implement robust error handling with meaningful fallback strategies.
❌ Using complex nested models without proper defaults creates schema inference problems that are difficult to debug. Keep your models as flat and simple as possible while meeting your requirements.
❌ Ignoring model compatibility requirements leads to runtime failures when deploying across different LLM providers. Always validate compatibility during development.

🚀 Production Tips:

🚀 Create separate UDFs for different use cases (sentiment, extraction, classification) to maintain clear separation of concerns. This modular approach simplifies testing, monitoring, and maintenance while enabling fine-grained access controls.
🚀 Monitor your AI function performance and accuracy using Databricks' built-in observability tools. Track metrics like response time, error rates, and schema validation failures to ensure production reliability.
🚀 Version your Pydantic models alongside your code using standard software engineering practices. Schema evolution requires careful change management to prevent breaking downstream consumers.
🚀 Use meaningful function names that describe their purpose and expected inputs. This documentation-as-code approach improves discoverability and reduces onboarding time for new team members.

🎉 Wrapping Up

Structured output transforms unreliable AI text generation into predictable, type-safe data operations that integrate seamlessly with enterprise data architectures. With these patterns, you're not just calling AI models—you're building robust data intelligence systems that scale with your organization's needs.

The combination of Pydantic models, helper functions, and factory patterns gives you enterprise-grade reliability while maintaining developer productivity. These patterns have been battle-tested in production environments where reliability and governance are non-negotiable requirements.

Now go forth and build amazing AI-powered data applications that your operations team will actually trust in production! 🚀

Remember: The key to success with structured output is thinking like a data engineer—define your contracts upfront, validate everything, and build for scale from day one. This discipline separates hobbyist AI experiments from production-ready data intelligence systems.