Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fix Your Strings in PostgreSQL

FTisiot
October 07, 2021

Fix Your Strings in PostgreSQL

Strings are one of the most used types in databases; they can store pretty much any data and don't enforce any rules on the inserted input. Yet too much freedom sometimes leads to inconsistencies: is it Aivan or Aiven? Øyvind or Oyvind? Wine or Whine?
These seemingly small differences can have bad side-effects, causing lookups to fail and incorrect aggregation results to be returned. Luckily all is not lost: PostgreSQL has some features that can help us make sense of the chaos.

In this talk you will learn what PostgreSQL has to offer: starting with pattern matching, passing by regular expressions, and ending with more advanced functionality exposed by the fuzzystrmatch and unaccent extensions. I'll demonstrate what tools can help you fixing string inconsistencies and how to avoid making the same mistakes again in the future. This session is recommended for anyone who deeply cares about their (string) data quality.

FTisiot

October 07, 2021
Tweet

More Decks by FTisiot

Other Decks in Technology

Transcript

  1. @ftisiot | @aiven_io Strings Francesco 1200$ • books • stars

    • computers I come from Verona and I like football and good food 25-Jan-2021 ^@^@^A^@^A^@^P^P^@^@^A^@ ^@h^D^@^@^V^@^@^@(^@^@^@^P^@^@^@ ^@^@^@^A^@ ^@^@^@^@^@^@^D^@^@^R^K^@^@^R^K^@^@^@^@^@^@^@^@^@^ @<FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FE><FE><FE><FF><FE><FE><FE><FF><FC><FC><FC><FF ><FC><FC><FC><FF><FC> <FC><FC><FF><FC><FC><FC><FF><FC><FC><FC><FF><FC><FC><FC><F F><FC><FC><FC><FF>< <D7><FF><D6><D4><D0><FF><CF><CD><C8><FF><CD><C9><C6><FF>< D4><D1><CE><FF><D9><D7><D3><FF><DC><DA><D5><FF><DB><D9><D 4><FF><ED><EB><E6><FF>~^?^? <FF><E1><E0><DC><FF><DC><DA><D6><FF><D9><D7><D4><FF><D4>< D3><CF><FF><CF><CD><C9><FF><C8><C6><C2><FF>^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  2. @ftisiot | @aiven_io CREATE TABLE MY_RESTAURANT_DATA CREATE TABLE MY_RESTAURANT_DATA (

    ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, ); CREATE TABLE MY_RESTAURANT_DATA ( ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, EMAIL VARCHAR, PHONE_NUMBER VARCHAR, ); CREATE TABLE MY_RESTAURANT_DATA ( ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, EMAIL VARCHAR, PHONE_NUMBER VARCHAR, BOOKING_DATE VARCHAR, ); CREATE TABLE MY_RESTAURANT_DATA ( ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, EMAIL VARCHAR, PHONE_NUMBER VARCHAR, BOOKING_DATE VARCHAR, INTERESTS VARCHAR, PEOPLE VARCHAR );
  3. @ftisiot | @aiven_io Fix Your Strings - Array Functions -

    Date Functions - Pattern Matching - Domain - Email Domain - Unaccent - FuzzyStrMatch - Aiven https:/ /www.postgresql.org/docs/current/functions-array.html https:/ /www.postgresql.org/docs/current/functions-datetime.html https:/ /www.postgresql.org/docs/current/functions-matching.html https:/ /www.postgresql.org/docs/current/sql-createdomain.html https:/ /bit.ly/3ol6yfr https:/ /www.postgresql.org/docs/current/unaccent.html https:/ /www.postgresql.org/docs/current/fuzzystrmatch.html https:/ /aiven.io/