Slide 1

Slide 1 text

Finding Bugs in Database Systems via Query Partitioning Manuel Rigger Zhendong Su ETH Zurich, Switzerland OOPSLA 2020 @RiggerManuel @ast_eth https://people.inf.ethz.ch/suz/

Slide 2

Slide 2 text

2 Database Management Systems (DBMSs) PostgreSQL

Slide 3

Slide 3 text

3 PostgreSQL Database Management Systems (DBMSs) MySQL is the most popular open-source DBMS

Slide 4

Slide 4 text

4 Database Management Systems (DBMSs) PostgreSQL TiDB and CockroachDB are highly-popular, commercially developed NewSQL DBMSs

Slide 5

Slide 5 text

5 PostgreSQL Database Management Systems (DBMSs) DuckDB is an emerging embedded DBMS for analytical query workloads

Slide 6

Slide 6 text

6 Database Management Systems (DBMSs) PostgreSQL We found 193 bugs in these systems, 142 of which have been fixed

Slide 7

Slide 7 text

7 Goal: Find Logic Bugs Logic bugs: DBMS returns an incorrect result set

Slide 8

Slide 8 text

8 Motivating Example c0 0 c0 -0 t0 t1 SELECT * FROM t0, t1 WHERE t0.c0 = t1.c0; ?

Slide 9

Slide 9 text

9 Motivating Example c0 0 c0 -0 t0 t1 SELECT * FROM t0, t1 WHERE t0.c0 = t1.c0; t0.c0 t0.c1 0 -0 The DBMS is expected to fetch the row since 0 = -0 should evaluate to TRUE ✓

Slide 10

Slide 10 text

10 Motivating Example c0 0 c0 -0 t0 t1 SELECT * FROM t0, t1 WHERE t0.c0 = t1.c0; t0.c0 t0.c1 The latest version of MySQL that we tested failed to fetch the row  https://bugs.mysql.com/bug.php?id=99122

Slide 11

Slide 11 text

11 Motivating Example c0 0 c0 -0 t0 t1 SELECT * FROM t0, t1 WHERE t0.c0 = t1.c0; t0.c0 t0.c1 Challenge: Finding a test oracle that can automatically validate the result set Incorrect result! https://bugs.mysql.com/bug.php?id=99122

Slide 12

Slide 12 text

12 Background: Differential Testing Query Generator

Slide 13

Slide 13 text

13 Background: Differential Testing Query Generator RS1 RS2 RS3

Slide 14

Slide 14 text

14 Background: Differential Testing Query Generator RS1 = RS2 = RS3 ? RS1 RS2 RS3

Slide 15

Slide 15 text

15 Background: Differential Testing DBMS- specific SQL Common SQL Core

Slide 16

Slide 16 text

16 Background: Differential Testing DBMS- specific SQL Common SQL Core “We are unable to use Postgres as an oracle because CockroachDB has slightly different semantics and SQL support, and generating queries that execute identically on both is tricky […].” – Cockroach Labs

Slide 17

Slide 17 text

17 Core Idea Novel test oracle: Partition a query into multiple partitioning queries, from which the original query’s result set can be derived

Slide 18

Slide 18 text

18 Query Partitioning Query Generator Q RS(Q)

Slide 19

Slide 19 text

19 Query Partitioning Query Generator Q RS(Q) Q denotes the (randomly-generated) original query

Slide 20

Slide 20 text

20 Query Partitioning Query Generator Q RS(Q) RS(Q) denotes the original query’s result set

Slide 21

Slide 21 text

21 Query Partitioning Query Generator Q RS(Q) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n Partition the result set

Slide 22

Slide 22 text

22 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’

Slide 23

Slide 23 text

23 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’ Combine the results so that RS(Q)=RS(Q’)

Slide 24

Slide 24 text

24 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’ ✓ =

Slide 25

Slide 25 text

25 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’ ≠

Slide 26

Slide 26 text

26 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’ Partitioning queries

Slide 27

Slide 27 text

27 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’ Partitions

Slide 28

Slide 28 text

28 Query Partitioning Query Generator Q RS(Q) RS(Q’) RS(Q’1 ) RS(Q’2 ) RS(Q’3 ) RS(Q’n ) Q’1 Q’2 Q’3 Q’n ♢ Q’ Composition operator

Slide 29

Slide 29 text

29 How to Realize This Idea? Key challenge: find a valid partitioning strategy that stresses the DBMS

Slide 30

Slide 30 text

30 Ternary Logic Consider a predicate φ and a given row r. Exactly one of the following must hold: • φ • NOT φ • φ IS NULL

Slide 31

Slide 31 text

31 Ternary Logic Consider a predicate φ and a given row r. Exactly one of the following must hold: • φ • NOT φ • φ IS NULL

Slide 32

Slide 32 text

32 Ternary Logic Consider a predicate φ and a given row r. Exactly one of the following must hold: • φ • NOT φ • φ IS NULL φ NOT φ φ IS NULL

Slide 33

Slide 33 text

33 Ternary Logic Consider a predicate φ and a given row r. Exactly one of the following must hold: • φ • NOT φ • φ IS NULL ternary predicate variants

Slide 34

Slide 34 text

34 Motivating Example c0 0 c0 -0 t0 t1 SELECT * FROM t0, t1 WHERE t0.c0 = t1.c0; t0.c0 t0.c1  https://bugs.mysql.com/bug.php?id=99122 How did this insight allow us to detect this bug?

Slide 35

Slide 35 text

35 Example: MySQL SELECT * FROM t0, t1;

Slide 36

Slide 36 text

36 Example: MySQL t0.c0 t0.c1 0 -0 SELECT * FROM t0, t1;

Slide 37

Slide 37 text

37 Example: MySQL t0.c0 t0.c1 0 -0 SELECT * FROM t0, t1 WHERE t0.c0=t1.c0 UNION ALL SELECT * FROM t0, t1 WHERE NOT (t0.c0=t1.c0) UNION ALL SELECT * FROM t0, t1 WHERE (t0.c0=t1.c0) IS NULL; SELECT * FROM t0, t1;

Slide 38

Slide 38 text

38 Example: MySQL t0.c0 t0.c1 0 -0 SELECT * FROM t0, t1 WHERE t0.c0=t1.c0 UNION ALL SELECT * FROM t0, t1 WHERE NOT (t0.c0=t1.c0) UNION ALL SELECT * FROM t0, t1 WHERE (t0.c0=t1.c0) IS NULL; SELECT * FROM t0, t1; φ

Slide 39

Slide 39 text

39 Example: MySQL t0.c0 t0.c1 0 -0 SELECT * FROM t0, t1 WHERE t0.c0=t1.c0 UNION ALL SELECT * FROM t0, t1 WHERE NOT (t0.c0=t1.c0) UNION ALL SELECT * FROM t0, t1 WHERE (t0.c0=t1.c0) IS NULL; SELECT * FROM t0, t1; t0.c0 t0.c1 φ

Slide 40

Slide 40 text

40 Example: MySQL t0.c0 t0.c1 0 -0 SELECT * FROM t0, t1 WHERE t0.c0=t1.c0 UNION ALL SELECT * FROM t0, t1 WHERE NOT (t0.c0=t1.c0) UNION ALL SELECT * FROM t0, t1 WHERE (t0.c0=t1.c0) IS NULL; SELECT * FROM t0, t1; t0.c0 t0.c1 ≠ φ

Slide 41

Slide 41 text

41 Ternary Logic Partitioning (TLP) Query Generator Q RS(Q) RS(Q’) Q’p Q’p IS NULL ♢ Q’ Q ¬p RS(Q’p ) RS(Q ¬p ) RS(Q’p IS NULL ) SELECT * FROM t0, t1;

Slide 42

Slide 42 text

42 Ternary Logic Partitioning (TLP) Query Generator Q RS(Q) RS(Q’) Q’p Q’p IS NULL ♢ Q’ Q ¬p RS(Q’p ) RS(Q ¬p ) RS(Q’p IS NULL ) SELECT * FROM t0, t1 WHERE t0.c0=t1.c0

Slide 43

Slide 43 text

43 Ternary Logic Partitioning (TLP) Query Generator Q RS(Q) RS(Q’) Q’p Q’p IS NULL ♢ Q’ Q ¬p RS(Q’p ) RS(Q ¬p ) RS(Q’p IS NULL ) SELECT * FROM t0, t1 WHERE NOT (t0.c0=t1.c0)

Slide 44

Slide 44 text

44 Ternary Logic Partitioning (TLP) Query Generator Q RS(Q) RS(Q’) Q’p Q’p IS NULL ♢ Q’ Q ¬p RS(Q’p ) RS(Q ¬p ) RS(Q’p IS NULL ) SELECT * FROM t0, t1 WHERE (t0.c0=t1.c0) IS NULL;

Slide 45

Slide 45 text

45 Ternary Logic Partitioning (TLP) Query Generator Q RS(Q) RS(Q’) Q’p Q’p IS NULL ♢ Q’ Q ¬p RS(Q’p ) RS(Q ¬p ) RS(Q’p IS NULL ) UNION ALL

Slide 46

Slide 46 text

46 Scope • WHERE • GROUP BY • HAVING • DISTINCT queries • Aggregate functions

Slide 47

Slide 47 text

47 Testing WHERE Clauses Q Q’ptern ♢(Q’p , Q’¬p , Q’p IS NULL ) SELECT FROM [] SELECT FROM [] WHERE ptern Q′p ⊎ Q′¬p ⊎ Q′p IS NULL

Slide 48

Slide 48 text

48 Testing WHERE Clauses Q Q’ptern ♢(Q’p , Q’¬p , Q’p IS NULL ) SELECT FROM [] SELECT FROM [] WHERE ptern Q′p ⊎ Q′¬p ⊎ Q′p IS NULL The multiset addition can be implemented using UNION ALL

Slide 49

Slide 49 text

49 Scope • WHERE • GROUP BY • HAVING • DISTINCT queries • Aggregate functions

Slide 50

Slide 50 text

50 Testing Self-decomposable Aggregate Functions Q Q’ptern ♢(Q’p , Q’¬p , Q’p IS NULL ) SELECT MAX() FROM [] SELECT MAX() FROM [] WHERE ptern; MAX(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL )

Slide 51

Slide 51 text

51 Testing Self-decomposable Aggregate Functions Q Q’ptern ♢(Q’p , Q’¬p , Q’p IS NULL ) SELECT MAX() FROM [] SELECT MAX() FROM [] WHERE ptern; MAX(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL ) A partition is an intermediate result, rather than a subset of the result set

Slide 52

Slide 52 text

52 Bug Example: CockroachDB SET vectorize=experimental_on; CREATE TABLE t0(c0 INT); CREATE TABLE t1(c0 BOOL) INTERLEAVE IN PARENT t0(rowid); INSERT INTO t0(c0) VALUES (0); INSERT INTO t1(rowid, c0) VALUES(0, TRUE);

Slide 53

Slide 53 text

53 Bug Example: CockroachDB SET vectorize=experimental_on; CREATE TABLE t0(c0 INT); CREATE TABLE t1(c0 BOOL) INTERLEAVE IN PARENT t0(rowid); INSERT INTO t0(c0) VALUES (0); INSERT INTO t1(rowid, c0) VALUES(0, TRUE); NULL SELECT MAX(t1.rowid) FROM t1;

Slide 54

Slide 54 text

54 Bug Example: CockroachDB SET vectorize=experimental_on; CREATE TABLE t0(c0 INT); CREATE TABLE t1(c0 BOOL) INTERLEAVE IN PARENT t0(rowid); INSERT INTO t0(c0) VALUES (0); INSERT INTO t1(rowid, c0) VALUES(0, TRUE); NULL 0 SELECT MAX(aggr) FROM ( SELECT MAX(t1.rowid) as aggr FROM t1 WHERE '+' >= t1.c0 UNION ALL SELECT MAX(t1.rowid) as aggr FROM t1 WHERE NOT('+' >= t1.c0) UNION ALL SELECT MAX(t1.rowid) as aggr FROM t1 WHERE ('+' >= t1.c0) IS NULL ); SELECT MAX(t1.rowid) FROM t1; ≠

Slide 55

Slide 55 text

55 Testing Decomposable Aggregate Functions Q Q’ptern ♢(Q’p , Q’¬p , Q’p IS NULL ) SELECT AVG() FROM []; SELECT SUM() as s, COUNT() as s FROM []; SUM(s(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL )) SUM(c(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL ))

Slide 56

Slide 56 text

56 Implementation https://github.com/sqlancer

Slide 57

Slide 57 text

57 SQLancer Generate a Database Generate a Query Validate the Query’s Result

Slide 58

Slide 58 text

58 SQLancer Generate a Database Generate a Query Validate the Query’s Result Heuristic database and query generators for DuckDB, H2 and TiDB

Slide 59

Slide 59 text

59 SQLancer Generate a Database Generate a Query Validate the Query’s Result TLP test oracles

Slide 60

Slide 60 text

60 Evaluation: Found Bugs Closed DBMS Fixed Verified Intended Duplicate SQLite 4 0 0 0 MySQL 1 6 3 0 H2 16 2 0 1 CockroachDB 23 8 0 0 TiDB 26 35 0 1 DuckDB 72 0 0 2 We found 193 unique, previously unknown bugs, 142 of which have been fixed!

Slide 61

Slide 61 text

61 Evaluation: Found Bugs Query Partitioning Oracle DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash SQLite 0 3 0 0 1 0 0 CockroachDB 3 3 0 1 0 22 2 TiDB 29 0 1 0 0 27 4 MySQL 7 0 0 0 0 0 0 DuckDB 21 4 1 2 1 13 19 H2 2 0 0 0 0 16 0

Slide 62

Slide 62 text

62 Query Partitioning Oracle DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash SQLite 0 3 0 0 1 0 0 CockroachDB 3 3 0 1 0 22 2 TiDB 29 0 1 0 0 27 4 MySQL 7 0 0 0 0 0 0 DuckDB 21 4 1 2 1 13 19 H2 2 0 0 0 0 16 0 Evaluation: Found Bugs The WHERE oracle is the simplest, but most effective oracle

Slide 63

Slide 63 text

63 Query Partitioning Oracle DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash SQLite 0 3 0 0 1 0 0 CockroachDB 3 3 0 1 0 22 2 TiDB 29 0 1 0 0 27 4 MySQL 7 0 0 0 0 0 0 DuckDB 21 4 1 2 1 13 19 H2 2 0 0 0 0 16 0 Evaluation: Found Bugs The other oracles found interesting, but fewer bugs

Slide 64

Slide 64 text

64 Query Partitioning Oracle DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash SQLite 0 3 0 0 1 0 0 CockroachDB 3 3 0 1 0 22 2 TiDB 29 0 1 0 0 27 4 MySQL 7 0 0 0 0 0 0 DuckDB 21 4 1 2 1 13 19 H2 2 0 0 0 0 16 0 Evaluation: Found Bugs We tested them only for DBMSs for which the TLP WHERE oracle could not find additional bugs

Slide 65

Slide 65 text

65 Developer Reception: DuckDB This work is tremendously helpful for us, and I imagine anyone working on a DBMS. Usually these bugs would be slowly found by users over the years, not only negatively affecting the experience of those users but also requiring much more effort to debug and reproduce […].

Slide 66

Slide 66 text

66 Adoption

Slide 67

Slide 67 text

67 Adoption DuckDB runs SQLancer + TLP on every pull request

Slide 68

Slide 68 text

68 Adoption Ilya Yatsishin (Yandex) contributed a TLP implementation to SQLancer

Slide 69

Slide 69 text

69 Adoption https://www.monetdb.org/blog/faster-robuster-and-feature-richer-monetdb-in-2020-and-beyond “With the help of SQLancer, an automatic DBMS testing tool, we have been able to identify >100 potential problems in corner cases of the SQL processor.”

Slide 70

Slide 70 text

70 Adoption PingCAP implemented TLP as a tool go-sqlancer https://github.com/chaos-mesh/go-sqlancer

Slide 71

Slide 71 text

71 Adoption Mining for logic bugs in the Citus extension to Postgres with SQLancer by Nazli Ugur Koyluoglu https://www.citusdata.com/blog/2020/09/04/mining-for-logic-bugs-in-citus-with-sqlancer/

Slide 72

Slide 72 text

72 @RiggerManuel [email protected] Summary Goal: detect logic bugs Query Partitioning as a general concept TLP as a concrete partition strategy Evaluation: Close to 200 bugs in DBMSs