Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inexpensive Datamasking for MySQL with ProxySQL...

ProxySQL LLC
November 07, 2017

Inexpensive Datamasking for MySQL with ProxySQL - Data Anonymization for Developers

This presentation was given by Rene Cannao, ProxySQL author and CEO at Moscow HighLoad++ 2017.

Talk abstract is as follows:

"During this session we will cover the last development in ProxySQL to support regular expressions (RE2 and PCRE) and how we can use this strong technique in correlation with ProxySQL's query rules to anonymize live data quickly and transparently. We will explain the mechanism and how to generate these rules quickly. We show live demo with all challenges we got from the Community and we finish the session by an interactive brainstorm testing queries from the audience."

ProxySQL LLC

November 07, 2017
Tweet

More Decks by ProxySQL LLC

Other Decks in Technology

Transcript

  1. Who we are René Cannaò Founder of ProxySQL MySQL SRE

    at Dropbox thanks to: Frédéric Descamps MySQL Community Manager
  2. Other Sessions 273. ProxySQL, MaxScale, MySQL Router and other database

    traffic managers / Petr Zaitsev (Percona) 155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
  3. Agenda • Database overview • What is ProxySQL • Features

    overview • Data masking • Rules • Masking rules • Obfuscation with mysqldump • Examples
  4. Main motivations empower the DBAs Improves manageability understand and improve

    performance High performance and High Availability create a proxy layer to shield the database
  5. ProxySQL Features (short list) High Availability and Scalability seamless failover

    firewall query throttling query timeout query mirroring runtime reconfiguration Scheduler Support for Galera/PXC and Group Replication on-the-fly rewrite of queries caching reads outside the database connection pooling and multiplexing complex query routing and r/w split load balancing real time statistics monitoring Data masking Multiple instances on same ports Native Clustering
  6. Data Masking Data masking or data obfuscation is the process

    of hiding original data with random characters or data. The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles
  7. Why using ProxySQL as data masking solution? Open Source &

    Free like in beer Other solutions are expensive or not working Not worse than the other solutions as currently none is perfect The best solution would be to have this feature implemented in the server just after the handler API
  8. Query Rewrite Dynamically rewrite queries sent by the application/client without

    the client being aware on the fly using ProxySQL query rules rules defined using regular expressions, s/match/replace/
  9. The concept We use Regular Expressions to modify the clients’

    SQL statement and replace the column(s) we want to hide by some characters or generate fake data. We will split our solution in two different solutions: • Provide access to the database to developers • Generate dump to populate a database to share Only the defined users, in our example we use a developer, will have his statements modified.
  10. The concept (2) We will also create two categories :

    •data masking •data obfuscating
  11. Data Masking Here we will just mask with a generic

    character the full value of the column or part of it:
  12. Data Obfuscation Here we will just replace the value of

    the column with random characters of the same type, we create fake data
  13. Access INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('devel','devel',1,1);

    INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('backup','dumpme',1,1); Create a user for masking: Create a user for backups:
  14. Rules Avoid SELECT * for the developer, we need to

    create some rules to block any SELECT * variant on the table if the column is part of many tables, we need to do so for each of them
  15. Rules (2) Mask or obfuscate the field when the field

    is selected in the columns we need: • to replace the column by showing the first 2 characters and a certain amount of X s or generate a random string • keep the column name • for mysqldump we need to allow SELECT * but mask and/or obfuscate sensible values
  16. Rules overview rule_id: 1 active: 1 username: devel schemaname: employees

    flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0 Rule #1
  17. rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0

    match_pattern: (\(?)(`?\w+`?\.)?first_name(\)?)([ ,\n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: \1CONCAT(LEFT(\2first_name,2),REPEAT('X',10))\3 first_name\4 apply: 0 Rule #2
  18. rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0

    match_pattern: (\(?)(`?\w+`?\.)?salary(\)?)([ ,\n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: \1CONCAT( floor(rand() * 50000) + 10000,'')\3 salary\4 Rule #2 - obfuscating Let's imagine we want to provide fake number for `salaries`.`salary` column. We could instead of the previous rule use this one
  19. rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0

    match_pattern: \)(\)?) first_name\s+(\w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2, apply: 1 Rule #3
  20. rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0

    match_pattern: \)(\)?) first_name\s+(.*)\s+from re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2 from apply: 1 Rule #4
  21. rule_id: 5 active: 1 username: devel schemaname: employees match_pattern: ^SELECT\s+\*.*FROM.*employees

    re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0 Rule #5
  22. rule_id: 6 active: 1 username: devel schemaname: employees match_pattern: ^SELECT\s+employees\.\*.*FROM.*employees

    re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0 Rule #6
  23. rule_id: 7 active: 1 username: devel schemaname: employees match_pattern: ^SELECT\s+(\w+)\.\*.*FROM.*employees\s+(as\s+)?(\1)

    re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0 Rule #6
  24. Rules for mysqldump To provide a dump that might be

    used by developers, Q/A or support, we need to: • generate valid data • obfuscate sensitive information • rewrite SQL statements issued by mysqldump • only for tables and columns with sensitive data
  25. mysqldump rules rule_id: 8 active: 1 user: backup schema: employees

    flagIN: 0 match: ^/\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1 Rule #8
  26. mysqldump rules rule_id: 9 active: 1 user: backup schema: employees

    flagIN: 0 match: \* FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1 Rule #9
  27. Limitions • better support in proxySQL >= 1.4.x ◦ RE2

    an PCRE regexes • all fields with the same name will be masked whatever the name of the table is in the same schema • the regexps can always be not sufficient • block any query not matching whitelisted SQL statements • the dump via ProxySQL solution seems to be the best
  28. Make it easy This is not really easy isn´t it

    ? You can use this small bash script (https://github.com/lefred/maskit) to generate them: # ./maskit.sh -c first_name -t employees -d employees column: first_name table: employees schema: employees let's add the rules...
  29. Examples (2) More difficult: select emp_no, concat(first_name), last_name from employees;

    select emp_no, first_name, first_name from employees.employees select emp_no, `first_name` from employees; select emp_no, first_name -> from employees; (*)
  30. Examples (3) More difficult: select t1.first_name from employees.employees as t1;

    select emp_no, first_name as fred from employees; select emp_no, first_name rene from employees; select emp_no, first_name `as` from employees; select first_name as `as`, last_name from employees; select `t1`.`first_name` from employees.employees as t1;
  31. Examples (4) More difficult: select first_name fred, last_name from employees;

    select emp_no, first_name /* first_name */ from employees.employees; /* */ select last_name, first_name from employees; select CUSTOMERS.* from myapp.CUSTOMERS; select a.* from employees.employees a;`