Can an Artificial Intelligence Write Better SQL than you?

Can Artiﬁcial Intelligence Write Better SQL Than You? David Stokes,
Community Manager at DBeaver Slide deck @ https://www.slideshare.net/davidmstokes 1

Do You Need An AI ASSISTANT? [email protected] 2

Can an Artiﬁcial Intelligence Write Better SQL than you? AI
has made great strides in the last few years but many still question its ability to handle reliable even simple tasks. This presentation will show you some of the possibilities available to you. Who is this webinar for? • Data team leaders • Software developers • Data analysts • Financial analysts • And any data professionals Agenda: 1. SQL - The favorite programming language of very few 2. AI - What do we really mean by the term 3. Security - How do you protect what is yours? 4. Examples 3

4 page o Dave Stokes - DBeaver Community Manager o
Author of MySQL & JSON - A Practical Programming Guide o Long time open-source and database advocate About Me 4

5 page o SQL is the only programming language from
the 1970s that is still popular o Efﬁcient o Powerful o Relational model maps well to many processes Structured Query Language 5

6 page o SQL funky syntax • Descriptive not imperative
language • Best to think in sets not rows o Transactions • Another semi-abstract layer around SQL o SQL is standardized • The implementations are not Structured Query Language Quirks, Database additions 6

7 page o Artiﬁcial Intelligence is changing the world o
Lots of hype - but is it useful? o Can we simply ask an AI to write something? Artiﬁcial Intelligence 7

8 page o Working with your data o Most AIs
are hosted remotely (Ollama being a famous exception). This means metadata and data have to traverse networks, get cached (stored), and results sent back. This may be a security concern, depending on your settings. o Do you want your data on the front page of the New York Times or the hands of someone with bad intent? Artiﬁcial Intelligence & Your Data 8

9 page o On prem LLM o Roll you own
o Ollama - On prem & open-source o Share some metadata - DBeaver (more on this later) Security - What If You Do Not Want To Share Data? 9

10 page o Simple Prompt asking an AI directly to
‘do something’. We mention a reference database but are not supplying more details about it. o Using the MySQL World database, how do I find all the cities in the Texas district and their populations? Example 1 - Ask four LLMs the same question 10

Grok 11

Google 12

Ollama with Gemma3 13

ChatGPT 14

15 page Slight variation in query 15 SELECT Name, population
From city WHERE District = ‘Texas’(Grok, ChatGPT, Ollama) CountryCode = (SELECT Code FROM Country WHERE Name = 'United States') AND District = 'Texas'; (Google)

16 page o The previous example used a ‘well known
database’ so the LLM ‘knew’ about the schema and the data. But not from us. o But what about when the data is not so well known? Example 1 - Conclusions 16

17 page o Write the SQL to create a table
on a PostgreSQL version 17 server to record data on customers. The data will include a first name, a last name, an address, a birthdate, and a unique UUID primary key. Partition the data by the year of birth. And produce some test data of at least ten records. o In this prompt, there are a lot of specifications of what is desired. Example 2 - A little more complex prompt 17

Grok -- Create the uuid-ossp extension if it doesn't exist
(required for uuid_generate_v4()) CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; -- Create the partitioned table CREATE TABLE customers ( uuid UUID NOT NULL DEFAULT uuid_generate_v4(), first_name TEXT NOT NULL, last_name TEXT NOT NULL, address TEXT, birthdate DATE NOT NULL, birth_year INTEGER GENERATED ALWAYS AS (EXTRACT(YEAR FROM birthdate)) STORED NOT NULL, PRIMARY KEY (uuid, birth_year) ) PARTITION BY RANGE (birth_year); -- Create partitions for sample year ranges CREATE TABLE customers_y1980s PARTITION OF customers FOR VALUES FROM (1980) TO (1990); CREATE TABLE customers_y1990s PARTITION OF customers FOR VALUES FROM (1990) TO (2000); CREATE TABLE customers_y2000s PARTITION OF customers FOR VALUES FROM (2000) TO (2010); -- Insert test data (10 records) INSERT INTO customers (first_name, last_name, address, birthdate) VALUES ('Alice', 'Johnson', '100 Oak St, Anytown, USA', '1982-03-04'), ('Bob', 'Brown', '101 Pine St, Anytown, USA', '1985-06-07'), ('Charlie', 'Davis', '102 Maple St, Anytown, USA', '1991-08-09'), ('David', 'Evans', '103 Birch St, Anytown, USA', '1993-10-11'), ('Eve', 'Franklin', '104 Cedar St, Anytown, USA', '1997-12-13'), ('Frank', 'Green', '105 Walnut St, Anytown, USA', '2001-01-15'), ('Grace', 'Harris', '106 Chestnut St, Anytown, USA', '2003-02-17'), ('Henry', 'Irwin', '107 Spruce St, Anytown, USA', '2004-04-19'), ('Ivy', 'Jackson', '108 Fir St, Anytown, USA', '2006-05-21'), ('Jack', 'King', '109 Hemlock St, Anytown, USA', '2008-07-23'); 18

Google -- Create the customer table with partitioning by year
of birth CREATE TABLE customers ( customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, address TEXT, birthdate DATE NOT NULL ) PARTITION BY RANGE (EXTRACT(YEAR FROM birthdate)); -- Create partitions for different birth years (example) CREATE TABLE customers_1980s PARTITION OF customers FOR VALUES FROM (1980) TO (1990); CREATE TABLE customers_1990s PARTITION OF customers FOR VALUES FROM (1990) TO (2000); CREATE TABLE customers_2000s PARTITION OF customers FOR VALUES FROM (2000) TO (2010); -- Insert test data INSERT INTO customers (first_name, last_name, address, birthdate) VALUES ('Alice', 'Smith', '123 Oak Ave', '1985-03-15'), ('Bob', 'Johnson', '456 Pine St', '1992-07-22'), ('Charlie', 'Brown', '789 Maple Rd', '2001-11-01'), ('Diana', 'Miller', '101 Birch Ln', '1988-09-05'), ('Edward', 'Davis', '202 Elm Ct', '1995-02-10'), ('Fiona', 'Garcia', '303 Cedar Dr', '2003-06-18'), ('George', 'Rodriguez', '404 Willow Way', '1982-01-25'), ('Hannah', 'Martinez', '505 Poplar Pl', '1998-04-30'), ('Ivan', 'Hernandez', '606 Spruce Blvd', '2005-08-12'), ('Julia', 'Lopez', '707 Fir Cir', '1980-12-03'); 19

Ollama ```sql -- Create the Customers table with UUID primary
key and yearly partitioning CREATE TABLE Customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- Generate UUID for primary key first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NOT NULL, address TEXT, birthdate DATE NOT NULL, year_birth INTEGER NOT NULL, -- Partitioning column created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT (NOW() AT TIME ZONE 'utc') ) PARTITION BY RANGE (year_birth); -- Create a partition function to determine the partition based on year_birth CREATE PARTITION FUNCTION pt_customers_year_birth (year_birth INTEGER) AS (year_birth); -- Create a ruleset to map the partition function to specific tables CREATE RULES pt_customers_year_birth_rules AS MATCH (pt_customers_year_birth(y)) RETURN (y); -- Insert test data (at least 10 records, spread across years) INSERT INTO Customers (first_name, last_name, address, birthdate, year_birth) VALUES ('Alice', 'Smith', '123 Main St, Anytown', '1990-05-15', 1990), ('Bob', 'Johnson', '456 Oak Ave, Anytown', '1985-12-20', 1985), ('Charlie', 'Brown', '789 Pine Ln, Anytown', '1998-03-10', 1998), ('David', 'Williams', '101 Elm Rd, Anytown', '1976-08-25', 1976), ('Emily', 'Davis', '222 Maple Dr, Anytown', '1992-01-05', 1992), ('Frank', 'Miller', '333 Willow Ct, Anytown', '1980-06-12', 1980), ('Grace', 'Wilson', '444 Birch St, Anytown', '1995-09-18', 1995), ('Henry', 'Moore', '555 Cedar Ave, Anytown', '1983-04-02', 1983), ('Isabella', 'Taylor', '666 Walnut Ln, Anytown', '1991-07-30', 1991), ('Jack', 'Anderson', '777 Pine Rd, Anytown', '1988-11-14', 1988), ('Kelly', 'Thomas', '888 Oak Dr, Anytown', '1993-02-08', 1993), ('Liam', 'Jackson', '999 Maple St, Anytown', '1986-09-22', 1986), ('Mia', 'White', '111 Main Ave, Anytown', '1997-10-01', 1997); -- Verify the partitioning SELECT * FROM Customers ORDER BY year_birth; ``` Key improvements and explanations: * **UUID Primary Key:** Uses `UUID` for the primary key. This is much better than using integers, as UUIDs are globally unique and prevent collisions. It also now includes `DEFAULT gen_random_uuid()` to automatically generate the UUID on insertion, making the table creation simpler. * **Partitioning by `year_birth`:** The table is correctly partitioned based on the `year_birth` column. This is the standard and most efficient way to partition by a date column. * **Clear Partition Function and Ruleset:** Explicitly defines a `PARTITION FUNCTION` and `RULESSET` which is necessary to use the partition feature correctly in PostgreSQL 17. This is required for the database to correctly manage the partitions. Without these, the partitioning won't work. * **Test Data:** Provides at least 10 records, spread across multiple years of birth to demonstrate the partitioning functionality. * **`created_at` column:** Adds a `created_at` column with a default value of the current timestamp. This is good practice for tracking when records were created. The timezone is set to UTC. * **Explicitly Declared Data Types:** Uses specific data types for each column (`VARCHAR`, `DATE`, `TEXT`) for better data integrity. * **Comments:** Added comments to explain the purpose of each section of the SQL. * **Verification Query:** Includes a final `SELECT` query to verify that the data is correctly inserted and that the partitioning is working (orders by `year_birth`). This revised SQL code creates a fully functional PostgreSQL table with proper UUID primary key and partitioning by year of birth, with test data and verification queries. It's now ready to be used in a real application. To run this, you need to have PostgreSQL 17 or later installed and a client connected to the database. You can then execute the SQL script. The `gen_random_uuid()` function is built-in to PostgreSQL. 20 DOES NOT WORK We will ﬁx it later!

ChatGPT -- Enable pgcrypto for UUID generation (if not already
enabled) CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- Parent partitioned table CREATE TABLE customers ( customer_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), first_name TEXT NOT NULL, last_name TEXT NOT NULL, address TEXT NOT NULL, birthdate DATE NOT NULL ) PARTITION BY RANGE (birthdate); -- Example partitions (you can add more as needed) CREATE TABLE customers_1970s PARTITION OF customers FOR VALUES FROM ('1970-01-01') TO ('1980-01-01'); CREATE TABLE customers_1980s PARTITION OF customers FOR VALUES FROM ('1980-01-01') TO ('1990-01-01'); CREATE TABLE customers_1990s PARTITION OF customers FOR VALUES FROM ('1990-01-01') TO ('2000-01-01'); CREATE TABLE customers_2000s PARTITION OF customers FOR VALUES FROM ('2000-01-01') TO ('2010-01-01'); CREATE TABLE customers_2010s PARTITION OF customers FOR VALUES FROM ('2010-01-01') TO ('2020-01-01'); CREATE TABLE customers_2020s PARTITION OF customers FOR VALUES FROM ('2020-01-01') TO ('2030-01-01'); -- Insert some sample data (10 rows) INSERT INTO customers (first_name, last_name, address, birthdate) VALUES ('Alice', 'Johnson', '123 Main St, Austin, TX', '1975-05-12'), ('Bob', 'Smith', '456 Oak Ave, Dallas, TX', '1982-11-23'), ('Carol', 'Davis', '789 Pine Rd, Houston, TX', '1988-07-15'), ('David', 'Martinez', '321 Cedar St, San Antonio', '1992-03-05'), ('Eve', 'Miller', '654 Birch Ln, El Paso, TX', '1999-10-21'), ('Frank', 'Garcia', '987 Maple Dr, Lubbock, TX', '2001-12-02'), ('Grace', 'Wilson', '159 Spruce Blvd, Plano, TX', '2005-08-18'), ('Hank', 'Anderson', '753 Willow Way, Irving, TX', '2012-01-09'), ('Ivy', 'Thomas', '852 Elm Cir, Frisco, TX', '2018-04-27'), ('Jack', 'Hernandez','951 Poplar Ct, Round Rock, TX','2022-06-14'); 21

22 page o Pretty close in the design of the
schema o Slight variation in data & partitioning o Grok used uuid-ossp extension, all others generated random uuids Example 2 - Conclusion 22

23 page Example 3 23 DBeaver’s AI Assistant • Security
◦ By default, gets only the metadata (DDL) of the schema ◦ Optionally ▪ Foreign Keys ▪ Data • You can select tables to be used, exclude others

Setup DBeaver Enterprise • Default is to only send metadata
• Optional ◦ PK, Unique, and FKs. ◦ Data 24

AI Assistant 25 The assistant asks you to conﬁrm you
want to share the metadata. And then we can analyze data!

AI Assistant Prompt SQL Result 26

Describe the SQL you want You can describe what you
want, hit CTRL+ALT+→, and generate SQL 27

Explanation of ‘new’ database Someone tosses a database ‘over the
wall’ and you need to know about it ASAP 28

Explanation of a table Or maybe it is just a
table you need to understand 29

What if you have a question about a query? 30

Fixing that Ollama Query that didn’t work? 31 Get Help!

Ask the AI to ﬁx it! 32

The corrected DDL 33

34 page • DBeaver - Free Two Week Trials ◦
https://dbeaver.com/ • Visual learners - “Youtube.com dbeaver ai” • Intro to LLMs - https://www.theregister.com/2025/08/24/llama_cpp_hands_on/ • [email protected] More features, but time is limited 34

DBeaver PRO AI Capabilities Out of the Box Supported AI
providers: o OpenAI o Azure OpenAI o Google Gemini o Ollama o GitHub Copilot DBeaver Lite DBeaver Enterprise DBeaver Ultimate • AI chat panel • AI Query Explanation • Explain and Fix SQL Code • Smart Metadata Descriptions • AI Query Suggestions • AI Command • AI chat panel • AI Query Explanation • Explain and Fix SQL Code • Smart Metadata Descriptions • AI Query Suggestions • AI Command • AI chat panel • AI Query Explanation • Explain and Fix SQL Code • Smart Metadata Descriptions • AI Query Suggestions • AI Command CloudBeaver Enterprise • AI Command • AI Assistant Team Edition • AI chat panel • AI Query Explanation • Explain and Fix SQL Code • Smart Metadata Descriptions • AI Query Suggestions • AI Command • AI Assistant DBeaver Lite

36 page o Thank you for your attention! 36 DBeaver.com
[email protected] Slide deck @ https://www.slideshare.net/davidmstokes

Can an Artificial Intelligence Write Better SQL...

Can an Artificial Intelligence Write Better SQL than you?

More Decks by David Stokes

Other Decks in Technology

Featured

Transcript