Slide 1

Slide 1 text

Inexpensive Datamasking for MySQL with ProxySQL René Cannaò

Slide 2

Slide 2 text

Who we are René Cannaò Founder of ProxySQL MySQL SRE at Dropbox thanks to: Frédéric Descamps MySQL Community Manager

Slide 3

Slide 3 text

Other Sessions 273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona) 155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)

Slide 4

Slide 4 text

Agenda ● Database overview ● What is ProxySQL ● Features overview ● Data masking ● Rules ● Masking rules ● Obfuscation with mysqldump ● Examples

Slide 5

Slide 5 text

Overview of ProxySQL

Slide 6

Slide 6 text

Application and Database layers APPLICATIONS DATABASES

Slide 7

Slide 7 text

Main motivations empower the DBAs Improves manageability understand and improve performance High performance and High Availability create a proxy layer to shield the database

Slide 8

Slide 8 text

Database as a Service (layered) APPLICATIONS DATABASES + MANAGER(s) DAAS – REVERSE PROXY

Slide 9

Slide 9 text

What is ProxySQL? The MySQL data stargate

Slide 10

Slide 10 text

How to deploy

Slide 11

Slide 11 text

How to deploy

Slide 12

Slide 12 text

ProxySQL Features (short list) High Availability and Scalability seamless failover firewall query throttling query timeout query mirroring runtime reconfiguration Scheduler Support for Galera/PXC and Group Replication on-the-fly rewrite of queries caching reads outside the database connection pooling and multiplexing complex query routing and r/w split load balancing real time statistics monitoring Data masking Multiple instances on same ports Native Clustering

Slide 13

Slide 13 text

Support for ClickHouse

Slide 14

Slide 14 text

Data Masking Data masking or data obfuscation is the process of hiding original data with random characters or data. The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles

Slide 15

Slide 15 text

Why using ProxySQL as data masking solution? Open Source & Free like in beer Other solutions are expensive or not working Not worse than the other solutions as currently none is perfect The best solution would be to have this feature implemented in the server just after the handler API

Slide 16

Slide 16 text

Query Rules instructions to "program" ProxySQL behavior matching criteria actions flow control and chains

Slide 17

Slide 17 text

Query Rewrite Dynamically rewrite queries sent by the application/client without the client being aware on the fly using ProxySQL query rules rules defined using regular expressions, s/match/replace/

Slide 18

Slide 18 text

The concept We use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data. We will split our solution in two different solutions: ● Provide access to the database to developers ● Generate dump to populate a database to share Only the defined users, in our example we use a developer, will have his statements modified.

Slide 19

Slide 19 text

The concept (2) We will also create two categories : •data masking •data obfuscating

Slide 20

Slide 20 text

Data Masking Here we will just mask with a generic character the full value of the column or part of it:

Slide 21

Slide 21 text

Data Obfuscation Here we will just replace the value of the column with random characters of the same type, we create fake data

Slide 22

Slide 22 text

Access INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('devel','devel',1,1); INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('backup','dumpme',1,1); Create a user for masking: Create a user for backups:

Slide 23

Slide 23 text

Rules Avoid SELECT * for the developer, we need to create some rules to block any SELECT * variant on the table if the column is part of many tables, we need to do so for each of them

Slide 24

Slide 24 text

Rules (2) Mask or obfuscate the field when the field is selected in the columns we need: ● to replace the column by showing the first 2 characters and a certain amount of X s or generate a random string ● keep the column name ● for mysqldump we need to allow SELECT * but mask and/or obfuscate sensible values

Slide 25

Slide 25 text

Rules overview rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0 Rule #1

Slide 26

Slide 26 text

rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?first_name(\)?)([ ,\n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: \1CONCAT(LEFT(\2first_name,2),REPEAT('X',10))\3 first_name\4 apply: 0 Rule #2

Slide 27

Slide 27 text

rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?salary(\)?)([ ,\n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: \1CONCAT( floor(rand() * 50000) + 10000,'')\3 salary\4 Rule #2 - obfuscating Let's imagine we want to provide fake number for `salaries`.`salary` column. We could instead of the previous rule use this one

Slide 28

Slide 28 text

rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(\w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2, apply: 1 Rule #3

Slide 29

Slide 29 text

rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(.*)\s+from re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2 from apply: 1 Rule #4

Slide 30

Slide 30 text

rule_id: 5 active: 1 username: devel schemaname: employees match_pattern: ^SELECT\s+\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #5

Slide 31

Slide 31 text

rule_id: 6 active: 1 username: devel schemaname: employees match_pattern: ^SELECT\s+employees\.\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #6

Slide 32

Slide 32 text

rule_id: 7 active: 1 username: devel schemaname: employees match_pattern: ^SELECT\s+(\w+)\.\*.*FROM.*employees\s+(as\s+)?(\1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #6

Slide 33

Slide 33 text

Rules for mysqldump To provide a dump that might be used by developers, Q/A or support, we need to: ● generate valid data ● obfuscate sensitive information ● rewrite SQL statements issued by mysqldump ● only for tables and columns with sensitive data

Slide 34

Slide 34 text

mysqldump rules rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1 Rule #8

Slide 35

Slide 35 text

mysqldump rules rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: \* FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1 Rule #9

Slide 36

Slide 36 text

Limitions ● better support in proxySQL >= 1.4.x ○ RE2 an PCRE regexes ● all fields with the same name will be masked whatever the name of the table is in the same schema ● the regexps can always be not sufficient ● block any query not matching whitelisted SQL statements ● the dump via ProxySQL solution seems to be the best

Slide 37

Slide 37 text

Make it easy This is not really easy isn´t it ? You can use this small bash script (https://github.com/lefred/maskit) to generate them: # ./maskit.sh -c first_name -t employees -d employees column: first_name table: employees schema: employees let's add the rules...

Slide 38

Slide 38 text

Examples Easy ones: SELECT * FROM employees; SELECT emp_no, last_name, first_name FROM employees;

Slide 39

Slide 39 text

Examples (2) More difficult: select emp_no, concat(first_name), last_name from employees; select emp_no, first_name, first_name from employees.employees select emp_no, `first_name` from employees; select emp_no, first_name -> from employees; (*)

Slide 40

Slide 40 text

Examples (3) More difficult: select t1.first_name from employees.employees as t1; select emp_no, first_name as fred from employees; select emp_no, first_name rene from employees; select emp_no, first_name `as` from employees; select first_name as `as`, last_name from employees; select `t1`.`first_name` from employees.employees as t1;

Slide 41

Slide 41 text

Examples (4) More difficult: select first_name fred, last_name from employees; select emp_no, first_name /* first_name */ from employees.employees; /* */ select last_name, first_name from employees; select CUSTOMERS.* from myapp.CUSTOMERS; select a.* from employees.employees a;`

Slide 42

Slide 42 text

We need you!

Slide 43

Slide 43 text

Thank you! Questions? E: rene@proxysql.com