Save 37% off PRO during our Black Friday Sale! »

Fix Your Strings in PostgreSQL

A23789f299ed06fe7d9f1c6940440bfa?s=47 FTisiot
October 07, 2021

Fix Your Strings in PostgreSQL

Strings are one of the most used types in databases; they can store pretty much any data and don't enforce any rules on the inserted input. Yet too much freedom sometimes leads to inconsistencies: is it Aivan or Aiven? Øyvind or Oyvind? Wine or Whine?
These seemingly small differences can have bad side-effects, causing lookups to fail and incorrect aggregation results to be returned. Luckily all is not lost: PostgreSQL has some features that can help us make sense of the chaos.

In this talk you will learn what PostgreSQL has to offer: starting with pattern matching, passing by regular expressions, and ending with more advanced functionality exposed by the fuzzystrmatch and unaccent extensions. I'll demonstrate what tools can help you fixing string inconsistencies and how to avoid making the same mistakes again in the future. This session is recommended for anyone who deeply cares about their (string) data quality.

A23789f299ed06fe7d9f1c6940440bfa?s=128

FTisiot

October 07, 2021
Tweet

Transcript

  1. Francesco Tisiot - Developer Advocate @ftisiot Fix Your Strings!

  2. @ftisiot | @aiven_io Francesco Tisiot Tisio Tisiotto Tissiott Tizio Francisco

    Frank Fransiscos
  3. @ftisiot | @aiven_io Strings Francesco 1200$ • books • stars

    • computers I come from Verona and I like football and good food 25-Jan-2021 ^@^@^A^@^A^@^P^P^@^@^A^@ ^@h^D^@^@^V^@^@^@(^@^@^@^P^@^@^@ ^@^@^@^A^@ ^@^@^@^@^@^@^D^@^@^R^K^@^@^R^K^@^@^@^@^@^@^@^@^@^ @<FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF> <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF> <FF><FF><FF><FE><FE><FE><FF><FE><FE><FE><FF><FC><FC><FC><FF ><FC><FC><FC><FF><FC> <FC><FC><FF><FC><FC><FC><FF><FC><FC><FC><FF><FC><FC><FC><F F><FC><FC><FC><FF>< <D7><FF><D6><D4><D0><FF><CF><CD><C8><FF><CD><C9><C6><FF>< D4><D1><CE><FF><D9><D7><D3><FF><DC><DA><D5><FF><DB><D9><D 4><FF><ED><EB><E6><FF>~^?^? <FF><E1><E0><DC><FF><DC><DA><D6><FF><D9><D7><D4><FF><D4>< D3><CF><FF><CF><CD><C9><FF><C8><C6><C2><FF>^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  4. @ftisiot | @aiven_io I’m Francesco, a table for 2 Please

    Ok Fransisco
  5. @ftisiot | @aiven_io CREATE TABLE MY_RESTAURANT_DATA CREATE TABLE MY_RESTAURANT_DATA (

    ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, ); CREATE TABLE MY_RESTAURANT_DATA ( ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, EMAIL VARCHAR, PHONE_NUMBER VARCHAR, ); CREATE TABLE MY_RESTAURANT_DATA ( ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, EMAIL VARCHAR, PHONE_NUMBER VARCHAR, BOOKING_DATE VARCHAR, ); CREATE TABLE MY_RESTAURANT_DATA ( ID SERIAL, CALLER_NAME VARCHAR, CALLER_SURNAME VARCHAR, EMAIL VARCHAR, PHONE_NUMBER VARCHAR, BOOKING_DATE VARCHAR, INTERESTS VARCHAR, PEOPLE VARCHAR );
  6. @ftisiot | @aiven_io

  7. @ftisiot | @aiven_io How can we fix?

  8. @ftisiot | @aiven_io Unique Identifier e-mail ascii parsable different accounts

    symbols reassigned group accounts
  9. @ftisiot | @aiven_io Entropy Never Decreases

  10. @ftisiot | @aiven_io Avoid Future Chaos

  11. @ftisiot | @aiven_io People

  12. @ftisiot | @aiven_io Interests ARRAY

  13. @ftisiot | @aiven_io booking_date DATE

  14. @ftisiot | @aiven_io phone_number - email DOMAIN https:/ /dba.stackexchange.com/questions/68266/what-is-the-best-way-to-store-an-email-address-in-postgresql/165923#165923

  15. @ftisiot | @aiven_io Name and Surname UNACCENT FUZZYSTRMATCH PGSIMILARITY

  16. @ftisiot | @aiven_io From Chaos To Order

  17. @ftisiot | @aiven_io Fix Your Strings - Array Functions -

    Date Functions - Pattern Matching - Domain - Email Domain - Unaccent - FuzzyStrMatch - Aiven https:/ /www.postgresql.org/docs/current/functions-array.html https:/ /www.postgresql.org/docs/current/functions-datetime.html https:/ /www.postgresql.org/docs/current/functions-matching.html https:/ /www.postgresql.org/docs/current/sql-createdomain.html https:/ /bit.ly/3ol6yfr https:/ /www.postgresql.org/docs/current/unaccent.html https:/ /www.postgresql.org/docs/current/fuzzystrmatch.html https:/ /aiven.io/