Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Converting Legacy Applications To UTF-8

Converting Legacy Applications To UTF-8

What are all the angles to be considered when you decide to standardise your text to UTF-8

Avatar for Tim Swann

Tim Swann

June 04, 2015
Tweet

More Decks by Tim Swann

Other Decks in Technology

Transcript

  1. • Legacy Applications are a mish-mash of encodings • Ever

    dealt with Irish names? • French / German / Eastern European? • What about a €uro Symbol? THE PROBLEM
  2. <?php 
 header('Content-Type: text/html; charset=UTF-8') ; 
 ?> <meta http-equiv="Content-Type"

    content="text/html; charset=UTF-8" /> <meta charset="UTF-8" /> HEADERS All Done… 
 …Just Like Austria in 1982 - Pack up and go home
  3. Set Character Encodings Everywhere CHARACTER IN ≠ CHARACTER OUT Front

    End Pages ✔ Database ✔ Tables ✔ Columns ✔ Forms ✔ Server(s) ✔
  4. DATABASE Change Existing Tables to UTF-8 ALTER TABLE table_name DEFAULT

    CHARACTER SET 
 utf8 COLLATE utf8_unicode_ci ;
  5. DATABASE ALTER TABLE table_name CHANGE title title VARCHAR( 128 )

    CHARACTER SET utf8 COLLATE utf8_unicode_ci ; Columns Need to be UTF-8
  6. PHP TEXT CONVERSION <?php iconv('ISO-8859-1', 'UTF-8//TRANSLIT', $string); 
 // Translit

    = find a similar char e.. ä => a mb_convert_encoding($string, 'UTF-8', ‘ISO-8859-1')
 // requires php-mbstring extension for multi-bytue support PHP Functions
  7. PHP TEXT CONVERSION <?php htmlentities ( $string, ENT_QUOTES, 'UTF-8') ;

    PHP Functions Specify encoding when escaping output
  8. PHP TEXT CONVERSION <?php new UConverter([ string $destination_encoding [, string

    $source_encoding ]] ) $unconvertor->convert( $string ) ; Since PHP 5.5 - UConvertor