Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Migrating Rant & Rave to PostgreSQL

Migrating Rant & Rave to PostgreSQL

Recently we have started to find that MySQL is holding us back when it comes to our vision of growing a sustainable and scalable database. This talk will discuss the process Rant & Rave followed in order to migrate our core database. By discussing some of the challenges we overcame from mapping datatypes to differences in syntax it is hoped that other MySQL users will be better equipped to make the move to PostgreSQL.

Mike Fowler

July 09, 2014
Tweet

More Decks by Mike Fowler

Other Decks in Technology

Transcript

  1. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Migrating Rant & Rave to PostgreSQL Mike Fowler PGDay UK 2014
  2. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Overview • Who and what is Rant & Rave? • The journey to PostgreSQL – Why migrate? – Migration requirements – Moving the data – Adjusting the queries
  3. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 About Me • Technical Lead/ScrumMaster at Rant & Rave • Been using PostgreSQL for over 10 years • Contributed some XML support – XMLEXISTS/xpath_exists() – xml_is_well_formed() • Buildfarm member piapiac – Amazon EC2 based build for JDBC driver – Has lead to a number of bugfix patches for JDBC http://www.pgbuildfarm.org/cgi-bin/show_status.pl?member=piapiac
  4. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 We are a technology company providing Customer engagement solutions with a difference to more than 250 global corporations Customer engagement is “the ongoing interactions between customer and the company, offered by the company, chosen by the customer.” Paul Greenberg, ZDNet and credited with inventing the term ‘CRM’ What do we do?
  5. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 What makes us different? 1. Everyone else is just listening. We help businesses act before and after every interaction. 2. Everything we do is in real-time for the frontline. We call them ‘Moments of Truth’® those emotional opportunities to create Ravers from customers and your staff. 3. Rant & Rave is for all your customers, wherever they are, however they touch your business, whenever they touch your business.
  6. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 The Journey Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected] PGDayUK 2014
  7. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 The Platform Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected] PGDayUK 2014
  8. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Some of our customers... Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected] PGDayUK 2014
  9. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Some of our customers' feedback... Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected] PGDayUK 2014
  10. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 The Rant & Rave Database • 8189 tables with approximately 114 million rows – 242 'system' tables – Remaining tables are customer specific though many similarities exist • 2930 views – Views are used to customise data end users wish to see • e.g. A team leader might be restricted to seeing only the feedback relating to them and their team • No stored routines, functions, procedures or triggers
  11. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Why migrate? • MySQL has oddities that routinely catch us out – UNIONs are faster than ORs with JOINs – Transactions need retrying when under load – GROUP BYs allowing non-aggregated columns – “” = 0 – Nightly database restores sometimes fail for no reason • MySQL replication leaves a lot to be desired – Statement replication and timestamps – Manual does not inspire confidence
  12. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Migration Process Requirements • Minimise downtime • Ensure database equality – Schema equality – Data equality • Minimise software changes – Client query compatibility
  13. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Minimising Downtime 1) Have a complete clone ready + Fastest option - Must manage data changes from when clone was built - Clients must be build to support both database servers 2) Stop all writers, dump & restore - Slowest option + Easiest option - Downtime becomes a function of database size and server I/O speed
  14. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Minimising Dump & Restore • Smaller the database the faster the process – Remove unused tables & data – Create the schema in PostgreSQL – Move unchanging tables ahead of time • Archived data • In-active customer tables • Stream the dump & restore – Intermediary files are expensive
  15. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Introducing convert.pl • Perl script that manipulates mysqldump format into a format psql can execute • Can be executed as part of a stream mysqldump > convert.pl > psql • All conversion rules are in one place allowing for easy refinements as the differences between MySQL and PostgreSQL are identified and resolved • Aim to make it publicly available
  16. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 The mysqldump Format • SQL Script • Lines beginning - - are comments • Lines beginning /* and ending */ are MySQL version sepcific commands (e.g. /*!40101 SET NAMES utf8 */; ) • Every section is started with a comment header -- -- Table structure for table `system_users` --
  17. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 The mysqldump Format • Four different operations are found in the dump – Table schema -- Table structure for table `table_name` – Temporary table schema for views -- Temporary table structure for view `view_name` – View creation -- Final view structure for view `view_name` – Data -- Dumping data for table `table_name`
  18. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Dealing with differences • MySQL uses ` to quote identifiers, PostgreSQL “ – `table_name` => “table_name” • CREATE TABLE & VIEW is surprisingly similar • Most data types need only minor adjustment – Numeric types SMALLINT, INT, BIGINT, DOUBLE – Character types TINYTEXT, VARCHAR, LONGTEXT – Time types DATE, DATETIME, TIMESTAMP • Some datatypes need careful adjustment – TINYINT, BLOB
  19. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Numeric data types MySQL allows for a variable display size when declaring a numeric column - this is not storage size `field_id` int(11) MySQL Type PostgreSQL Type SMALLINT SMALLINT INT INTEGER INT UNSIGNED BIGINT BIGINT BIGINT DOUBLE DOUBLE PRECISION
  20. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Character data types As there is no performance difference between PostgreSQL's character data types we mapped all MySQL types to TEXT MySQL Type PostgreSQL Type TINYTEXT TEXT VARCHAR TEXT LONGTEXT TEXT
  21. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Time data types • mysqldump writes times as UTC without timezone information, specify --skip-tz-utc to keep timezone • MySQL CREATE TABLE allows for one timestamp column to have an update trigger updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP MySQL Type PostgreSQL Type DATE DATE DATETIME TIMESTAMP WITH TIME ZONE TIMESTAMP TIMESTAMP WITH TIME ZONE
  22. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 TINYINT data type • MySQL does not have a BOOLEAN type but uses a special case of TINYINT(1) • TINYINT(1) defaults can be adjusted for better readability MySQL Type PostgreSQL Type TINYINT(1) BOOLEAN TINYINT(1) DEFAULT '0' BOOLEAN DEFAULT FALSE TINYINT(1) DEFAULT '1' BOOLEAN DEFAULT TRUE TINYINT(4) SMALLINT
  23. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 BLOB data type • MySQL's various BLOB types can all be mapped to BYTEA • The default binary output is the source of many invalid multi-byte escape sequences. Specify --hex-blob to output the blobs as hexadecimal MySQL Type PostgreSQL Type TINYBLOB BYTEA BLOB BYTEA MEDIUMBLOB BYTEA LONGBLOB BYTEA
  24. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 AUTO_INCREMENT MySQL uses the AUTO_INCREMENT keyword to allow a column to take the next value from a sequence. This keyword is independent from the numeric data type used. PostgreSQL embeds this behaviour in the SERIAL data type. MySQL Type PostgreSQL Type TINYINT … AUTO_INCREMENT SERIAL SMALLINT … AUTO_INCREMENT SERIAL INT … AUTO_INCREMENT SERIAL INT UNSIGNED … AUTO_INCREMENT BIGSERIAL BIGINT … AUTO_INCREMENT BIGSERIAL
  25. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Column Annotations MySQL allows you to annotate columns in CREATE TABLE CREATE TABLE t (col INT COMMENT 'My commented column' …); PostgreSQL does not support this however you can achieve this after creating the table CREATE TABLE t (col INTEGER …); COMMENT ON COLUMN t.col IS 'My commented column';
  26. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Stuff to ignore mysqldump outputs commands that are not relevant to PostgreSQL as well as some extensions that are not needed. • CREATE TABLE (…) ENGINE=InnoDB AUTO_INCREMENT=156583 DEFAULT CHARSET=latin1; • LOCK TABLES; • UNLOCK TABLES;
  27. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 INSERTs mysqldump generates a multi-row version of INSERT. PostgreSQL supports this syntax but COPY is better suited to large inserts INSERT INTO `field_mappings` VALUES (1,892,'YMC','A9'), (2,892,'WIG','A81') becomes COPY field_mappings FROM STDIN WITH NULL AS 'NULL' CSV QUOTE AS ''''; 1,892,'YMC','A9' 2,892,'WIG','A81' \.
  28. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Escaped data • MySQL escapes quotes and backslashes: \', \\ • Escaped carriage returns (\r) and newlines (\n) need to be replaced with their real values • MySQL escapes hexadecimals with 0x MySQL Value PostgreSQL Value \' '' \\ \ \r \n 0xFFFF E'\\FFFF'
  29. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Bizarre data • The zero timestamp '0000-00-00 00:00:00' – Can be converted to NULL • > 0000-00-00 00:00:00 becomes IS NOT NULL • DFEAULT 0000-00-00 00:00:00 NOT NULL is removed – Use an alternate odd value (e.g. '1970-01-01 00:00:00') • Character encoding is painful – Specifying --default-character-set=utf8 helps – Dependant on shell environment – May need to map some sequences ( => \x{FFFD}) 
  30. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 VIEWs Syntax is subject to same caveats SELECT queries Extracting the VIEW from mysqldump is a little more involved than CREATE TABLE /*!50001 CREATE ALGORITHM=UNDEFINED */ /*!50013 DEFINER=`root`@`%` SQL SECURITY DEFINER */ /*!50001 VIEW `view_name` AS select `col1`.`col2` FROM `table_name` WHERE (`col3` = '9636') */;
  31. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 mysqldump For those taking notes the accumulated mysqldump command looks like this: mysqldump --single-transaction --skip-tz-utc --hex- blob --default-character-set=utf8 -uuser -ppassowrd db_name
  32. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Dealing with queries • Identify queries that use functions (e.g. isnull()) – Recreate functions in pl/pgsql – Rewrite query to produce same results • MySQL is usually case insensitive • PostgreSQL >= 8.3 does not support implicit casting – WHERE clauses are prime source – JOINs need verifying • MySQL GROUP BY behaviour is non-standard
  33. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Case Insensitivity: LIKE LIKE has a case insensitive alternate ILIKE. ILIKE does not perform too well however combining lower()/upper() with LIKE is better. DB WHERE clause Time MySQL customername LIKE '%Mike%' 20ms PSQL customername ILIKE '%mike%' 33ms PSQL lower(customername) LIKE '%mike%' 25ms
  34. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Case Insensitivity: Equality Equality can achieved by using lower() or upper(). Obviously slower than equality tests however in our set-up PostgreSQL initially out performs MySQL. DB WHERE clause 1st Time 2nd Time MySQL customername = 'Mike Fowler' 31ms 0ms PSQL lower(customername) = 'mike fowler' 23ms 19ms
  35. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Functional Indexes PostgreSQL allows the creation of indexes based on the computed results of functions. This improves query times at the expense of INSERT time. CREATE INDEX idxname ON table (lower(customername)); DB WHERE clause Time MySQL customername = 'Mike Fowler' 31ms PSQL (no index) lower(customername) = 'mike fowler' 23ms PSQL (index) lower(customername) = 'mike fowler' 0.15ms
  36. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 GROUP BY • MySQL allows nonaggreated columns not named in the GROUP BY clause – When a nonaggreated column has differing values MySQL will not choose in a determinate way – Appending nonaggreated columns to GROUP BY clause could generate extra rows compared to MySQL • MySQL group_concat(col) function – Aggregates all rows into a comma separated list – PostgreSQL equivalent string_agg(col,',')
  37. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Implicit Casting • Look for queries with numeric operands – WHERE col > 56 becomes WHERE col::INTEGER > 56 – WHERE col != 95 becomes WHERE col::INTEGER != 95 – WHERE col IN (345,347) becomes WHERE col::INTEGER IN (345,347) • You could cast everything as TEXT WHERE col1 = col2 becomes WHERE col1::TEXT = COL2::TEXT
  38. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Implicit Casting • JOINs with USING clause will need rewriting SELECT * FROM table1 JOIN table2 USING (id) becomes SELECT * FROM table1 t1 JOIN table2 t2 ON t1.id::TEXT = t2.id::TEXT • You could permanently ALTER the column data type ALTER TABLE table ALTER COLUMN col type INTEGER USING col::INTEGER – Saves modifying lots of WHERE clauses but INSERTS and UPDATES will need checking
  39. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Introducing MyPGJDBC • Simple wrapper over existing PG-JDBC • Idea is to rewrite & log MySQL style queries to work in PostgreSQL – Gives us a safety net for queries we miss – Identifies the queries so we can repair them • Aim to make it publicly available
  40. Migrating Rant & Rave to PostgreSQL Mike Fowler, [email protected] [email protected]

    PGDayUK 2014 Lessons from our experience • Allow more time than you expect • Moving the schema and data is the easiest part • Identifying, verifying and reworking queries is what takes the time