Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PHPTek 2024 - Seven Database Query Sins You Are Going To Hell For And How To Earn Redemption

PHPTek 2024 - Seven Database Query Sins You Are Going To Hell For And How To Earn Redemption

Writing good Structured Query Language queries takes effort.

But you don't have the time, so you use 'SELECT * FROM foo', and as your data grows, so does your query time.
Or, you have a series of queries that could be run in a batch but run one at a time in a transaction and can not determine why your server is slow.
These common problems and others are easy to fix if you know what is wrong and how to fix it. In this session you will see the big seven query sins and understand how to fix them.

PHPTek 2024 - PHPUGLY Foundation Studio Wed. April 24th 10:00 am - 10:50 am

David Stokes

April 23, 2024
Tweet

More Decks by David Stokes

Other Decks in Technology

Transcript

  1. Seven Database Sins You Are Going To Hell For And

    H To Earn Redemption [email protected] @Stoker Second Edition
  2. ©2023 Percona | Confidential Seven Database Query Sins You Are

    Going To Hell For And How To Earn Redemption Writing good Structured Query Language queries takes effort. But you don't have the time so you use 'SELECT * FROM foo' and as your data grows so does your query time. Or, you have a series of queries that could be run in a batch but run one at a time in a transaction and can not determine why your server is slow. These common problems and others are easy to fix if you know what is wrong and how to fix it. In this session you will see the big seven query sins and understand how to fix them. PHPTek 2024 - PHPUGLY Foundation Studio Wed. April 24th 10:00 am - 10:50 am 3
  3. ©2023 Percona | Confidential Who Am I I am Dave

    Stokes Technology Evangelist at Percona Author of MySQL & JSON - A Practical Programming Guide Over a decade on the Oracle MySQL Community Team Started with MySQL 3.29 [email protected] @Stoker https://speakerdeck.com/stoker
  4. ©2023 Percona | Confidential Writing good Structured Query Language queries

    takes effort. Yeah, many sinners in the PHP Community take the path of sin. SQL is another language to learn If you can ‘also’ use two PHP frameworks, four JavaScript frameworks, use a IDE plus VS Code, maintain a CI/CD cycle, and make pull requests - you are also smart enough to write good SQL and build good relational tables. SQL Syntax is a declarative language Just like XML, CSS, & HTML Artificial Intelligence is not going to replace your need to know SQL Just like the COmmon Business Oriented Language - COBOL - was supposed to replace programmers or spreadsheets replaced accountants, AI is not a panacea Object Relational Mappers make decisions on your behalf, usually bad ones 6
  5. ©2023 Percona | Confidential Edgar Codd Edgar Frank "Ted" Codd

    (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational database management systems. He made other valuable contributions to computer science, but the relational model, a very influential general theory of data management, remains his most mentioned, analyzed and celebrated achievement.[ 8
  6. ©2023 Percona | Confidential Introduced in the 1970s, SQL offered

    two main advantages over older read–write APIs such as ISAM or VSAM. Firstly, it introduced the concept of accessing many records with one single command. Secondly, it eliminates the need to specify how to reach a record, i.e., with or without an index. 9
  7. ©2023 Percona | Confidential SQL drastically reduced the duplication of

    data on very expensive (and slow disk drives). 10
  8. ©2023 Percona | Confidential Pride - Bad design Greed -

    Long transactions Lust - N+1 Envy - ORMs Gluttony - Query plan evolution Wrath - Not using soft deletes Sloth - Backups 12
  9. ©2023 Percona | Confidential It takes effort and hard thought

    to design a useful relational database. Design for the final implementation not the ‘agile, I get to check off a check-box this week’ incremental goal. Database are expensive to redesign and sometimes too expensive to correct. You may end up support a kludge forever • US railroad tracks are 4’8.5” apart • 360 base time system 14
  10. ©2023 Percona | Confidential Some examples 1. US based company

    acquires UK based company, merge all databases a. Zip codes and State names missing in England, Wales, Scotland, and Northern Ireland b. Different holidays c. Much different data regulation 2. New CIO decides to implement replacement HR system with latest technology a. Where do we find this ‘Cuba Netties’ guy, anyway. b. New hot database does not scale, has no easy way to import data, and drops records c. Can not afford to retrain current staff as they have to keep old system working. New staff seen as getting preferential treatment. Current staff start to depart. 3. Marketing requests a datalake, datawarehouse, or whatever you call it a. Marketing was fast searching of everything - native language processing, no SQL b. Heavy cross correlation, heavier computer costs c. Access from iPad, iWatch, iPhone, or Android devices 15
  11. ©2023 Percona | Confidential Suggestions Scope properly Scale properly Build

    in redundancies Add auditing upfront Make sure interactions with other systems vetted properly 16
  12. ©2023 Percona | Confidential Transactions must be kept short Ideal

    purchase: 1. Get information from customer a 2. Get information from customer b 3. Start transaction LOCKING RECORDS a. Debit customer a b. Credit customer b c. commit; 4. Done 18 Actual: 1. Get information from customer a 2. Start transaction LOCKING RECORDS a. Ask customer a if they would not mind taking a survey at the end of the transaction to rate service. b. Check to see if customer a has signed up for newsletter and SMS notifications. c. Check inventory for other options to purchase based on recommendations of others. d. Get past credit card information e. Calculate shipping rates …
  13. ©2023 Percona | Confidential Transactions are heavy load Locking records

    should block others from accessing them, causing them to wait As data churns around the transaction, the transaction has to stay isolated. Nested transactions compound issue. Transactions do time out, those waiting may timeout or give up 19
  14. ©2023 Percona | Confidential Lust is the desire for something

    to which we are not entitled, but self-important people feel entitled I want a list of all available cars with a V8 engine, come in red, and cost under $100k 1. Get all cars under $100k 2. Search that list for those with V8s 3. Search that list for those available in red or SELECT * FROM cars where price < 100000 and engine_v8 = true and color = ‘red’; 21
  15. ©2023 Percona | Confidential Solutions The database server is designed

    to get the wheat from the chaff. A complex query that asks for everything you need at once is better than many little piggy queries at the troth. When in possible, do batches UPDATE foo SET z=99 WHERE x < 5 and customer_current = 1; 1. Get first current customer 2. Do update for that customer 3. Repeat until out of customers who are current 23
  16. ©2023 Percona | Confidential Problems with ORMs Impedance mismatch -

    objects versus rows N+1, no way to see ‘big picture’ of the data They make too many trade offs to make ‘database go away’ Added layer of complexity, added layer of cycle consumption 25
  17. ©2023 Percona | Confidential Example First wave - 100GB database,

    1K active users at any one time Second Wave - 1,000GB database, 5K active users Third Wave – 100,000GB database, 20K active users 27
  18. ©2023 Percona | Confidential QUERY PLAN - Best option chosen

    by optimizer To pick the best of many options, the optimizer needs current information on the data. • Indexes • histograms • Index Dives/ANALYSE • Partitions • Shards 28
  19. ©2023 Percona | Confidential Compounding Issues Over time, new columns/tables

    added, usually suboptimally Usage patterns change Growth of data stored Growth of working set If it ain’t broke mentality 29
  20. ©2023 Percona | Confidential Record pertinent info Set benchmark query

    Rerun monthly Compare 30 Covering indexes New Indexes Histograms Redesign Partition/shard EXPLAIN Investigate
  21. ©2023 Percona | Confidential Keep track of your data Run

    this query and keep copy of output!!
  22. ©2023 Percona | Confidential SELECT TABLE_SCHEMA, sys.format_bytes(sum(table_rows)) `ROWS`, sys.format_bytes(sum(data_length)) DATA,

    sys.format_bytes(sum(index_length)) IDX, sys.format_bytes(sum(data_length) + sum(index_length)) 'TOTAL SIZE', round(sum(index_length) / sum(data_length),2) IDXFRAC FROM information_schema.TABLES GROUP By table_schema ORDER BY sum(DATA_length) DESC; 32 MySQL Query Slides at speakerdeck.com/stoker
  23. ©2023 Percona | Confidential PG Query SELECT d.datname as Name,

    pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) ELSE 'No Access' END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc LIMIT 20; 33 Slides at speakerdeck.com/stoker
  24. ©2023 Percona | Confidential Are you … Having to restore

    a lot of deleted records? Find it expensive (time | cost | sanity) to restore those record? Too permissive with that DELETE PRIV? 35
  25. ©2023 Percona | Confidential Example mysql> ALTER TABLE foo ADD

    COLUMN soft_delete TINYINT DEFAULT 0; mysql> CREATE INDEX main_pk ON foo (soft_delete,<PRIMARY-KEY>); 36 You may want to consider locking down the Delete Privilege if runaway deletes are a problem.
  26. ©2023 Percona | Confidential 38 A Database is only as

    good as its last backup You need to be able to recover from: A. A total loss of server B. A loss of a database C. A loss of a table D. (maybe) a loss of a single row
  27. ©2023 Percona | Confidential How long can you be down?

    Existing data: A. Loss happens, unnoticed B. Loss noticed C. Request for restore started D. Restoration E. Data restored, notified Incoming data A. Loss happens, unnoticed B. Loss noticed C. Request for restoration started D. What, if any, data was captured 39
  28. ©2023 Percona | Confidential Work around loss of some tables

    Could you take orders with part of your sales system offline? Batch orders later? Is it possible to cache ‘high end customer’ information elsewhere? (some animals are more equal than others) The ability to engineer around outages can take some pressure off during major outages. 40
  29. ©2023 Percona | Confidential One backup style does not fit

    all LVM Snapshots Cloud vendor snapshots Full backups Incremental backups One a quarter grab everything Corporate retention rule backup 41 Do you have ‘the pieces’ together’? • Operating system • RPMs/DEBs needed • Configuration • User account information • Partitioning/sharding information • Hardware settings • Network settings If you have to go from scratch, how long would it take you? Someone else on your staff?
  30. ©2024 Percona Track the performance of your most frequently run

    queries over time Track the growth in size of the data Track the growth of users accessing database Adjust before you need to • Cache/partition/shard • Pay attention to the database • Do your maintenance ◦ ANALYZE ◦ Purge accounts ◦ Rotate password Love the sinner, hate the sin 43