AI Functions in SQL - Revolutionizing Data Analysis

AI Functions in SQL - Revolutionizing Data Analysis

The adoption of Artificial Intelligence (AI) in enterprise environments is often associated with complex projects, specialized teams, and custom developments. However, Databricks has simplified this process by integrating AI Functions directly into SQL, allowing analysts and data engineers to apply AI capabilities to their data without leaving the lakehouse.

With these functions, tasks such as translation, sentiment analysis, classification, or summary generation become as simple as executing a SQL query.

What are AI Functions?

AI Functions are built-in functions in Databricks that enable the use of AI models directly from SQL, Notebooks, and production pipelines.

This means that AI runs in the same place where data lives, eliminating the need to move information to external systems and reducing friction in the workflow.

AI Functions in SQL Diagram

Main advantages:

Accessibility

AI Functions democratize access to artificial intelligence by eliminating traditional technical barriers.

  • Native SQL: They run directly as standard SQL functions without additional APIs
  • Zero-shot learning: They don’t require prior training or model configuration
  • Universal compatibility: They work in SQL warehouses, Python/Scala notebooks, and Delta Live Tables pipelines
  • Intuitive syntax: SELECT ai_summarize(text) FROM table is as simple as any built-in SQL function

Types of AI Functions

Databricks has designed an AI Functions ecosystem that is strategically divided into two complementary categories, each optimized for different usage patterns and levels of customization. This dual architecture allows both rapid implementation of common use cases and maximum flexibility for specific business requirements.

There are two main categories:

1. Task-specific functions

These are ready-to-use functions, designed to solve common business problems without the need to configure models. Some of the most used are:

  • ai_analyze_sentiment() → Detects sentiment (positive, negative, neutral) in text.
  • ai_classify() → Classifies text into predefined categories.
  • ai_fix_grammar() → Corrects grammatical errors in text.
  • ai_summarize() → Generates automatic summaries.
  • ai_translate() → Translates to different languages.
  • ai_forecast() → Performs forecasts on time series.

These functions are recommended for users looking for fast and consistent results.

2. General purpose function: ai_query()

It is the most flexible function, as it allows performing custom queries to language models. With it you can use:

  • Foundation models hosted by Databricks, such as Llama 3 or Gemma.
  • Own models deployed in Mosaic AI Model Serving.
  • External models connected via compatible endpoints.

Practical examples

AI Functions shine when applied to real use cases. Below, we explore implementations that demonstrate how these functions can transform everyday business operations, from feedback analysis to content automation.

Sentiment analysis on customer reviews

Case: Process customer feedback to identify satisfaction patterns.

-- Basic sentiment analysis
SELECT
    review_id,
    review_text,
    ai_analyze_sentiment(review_text) AS sentiment,
    ai_summarize(review_text) AS summary
FROM customer_reviews
WHERE created_date >= current_date() - 7
ORDER BY review_id;

-- Example result:
-- sentiment can be: 'positive', 'negative', 'neutral', 'mixed'

Automatic content classification

Case: Automatically categorize support tickets by department and urgency.

-- Ticket classification using ai_classify
SELECT
    ticket_id,
    description,
    ai_classify(description, array('technical', 'billing', 'general_inquiry')) AS department,
    ai_classify(description, array('low', 'medium', 'high', 'urgent')) AS priority
FROM support_tickets
WHERE status = 'open';

Automatic product translation

Case: Translate product descriptions for international markets.

-- Translation to multiple languages
SELECT
    product_id,
    description_en,
    ai_translate(description_en, 'Spanish') AS descripcion_es,
    ai_translate(description_en, 'French') AS description_fr,
    ai_translate(description_en, 'German') AS beschreibung_de
FROM product_catalog;

Custom queries with ai_query()

Case: Structured information extraction using foundation models.

-- Using Llama 3.3 model for custom analysis
SELECT
    customer_id,
    feedback_text,
    ai_query(
        "databricks-meta-llama-3-3-70b-instruct",
        "Extract the main complaint and suggested solution from this customer feedback: " || feedback_text
    ) AS analysis
FROM customer_feedback
LIMIT 10;

Grammar correction in content

Case: Improve the quality of user-generated content.

-- Automatic grammar correction
SELECT
    post_id,
    original_content,
    ai_fix_grammar(original_content) AS corrected_content
FROM user_posts
WHERE language = 'en';

Case: Find similar documents using vector search.

-- Vector index search (requires configured Mosaic AI Vector Search)
SELECT
    vector_search(
        index => 'catalog.schema.document_index',
        query => 'artificial intelligence machine learning',
        num_results => 5
    ) AS similar_documents;

Content generation with ai_gen

Case: Automatically create product descriptions.

-- Content generation
SELECT
    product_name,
    category,
    ai_gen("Write a compelling 2-sentence product description for: " || product_name || " in category: " || category) AS generated_description
FROM products
WHERE description IS NULL;

Enterprise use cases

AI Functions allow addressing a wide range of needs:

Customer Experience

Review analysis, spam detection or automatic classification of support tickets.

Finance

Generation of forecasts on sales or economic metrics.

Legal and compliance

Automatic summaries of contracts and extensive documents.

Operations

Extraction of key entities from unstructured records.

Scalability and production operation

One of Databricks’ greatest strengths is that these functions are not only designed for testing, but also for production environments.

They can be integrated into:

  • Lakeflow declarative pipelines.
  • Batch and streaming processing.
  • Production workflows with metrics monitoring.

Additionally, Databricks offers usage and cost dashboards, facilitating the management of expenses associated with model inference.

Conclusion

AI Functions in SQL represent a decisive step towards the democratization of AI in the data world. By integrating advanced language model capabilities directly into the Databricks ecosystem, any analyst or engineer can enrich their analysis without leaving the SQL environment.

This not only accelerates experimentation, but also enables real production use cases with scalability and cost control. Ultimately, it is a tool that brings the power of AI to the day-to-day operations of organizations.

Resources

  • #Sql
  • #AI
  • #Analytics
  • #Database
  • #Ai-functions
Share: