Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Breaking Bad

Data Breaking Bad

Open Stage talk at Berlin Buzzwords 2013

Michael Hausenblas

June 03, 2013
Tweet

More Decks by Michael Hausenblas

Other Decks in Technology

Transcript

  1. things you can influence things that affect you try and

    focus on this stuff Friday, 7 June 13
  2. The awkward moment when I open the data I got

    from a customer Friday, 7 June 13
  3. • Encöding hell • Schema? Sure, I fax you a

    screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  4. Encöding hell application-specific encodings • URL encoding • HTML encoding

    • Database escaping non-ASCII? a%20percent-encoded%20string%20as%20of%20RFC%203986 a <strong>HTML</strong> encoded string Friday, 7 June 13
  5. • Use Unicode • Use Unicode • Use Unicode Encöding

    hell http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/ Friday, 7 June 13
  6. • Encöding hell • Schema? Sure, I fax you a

    screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  7. Schema? Sure, I fax you a screenshot • There is

    a need for proper, formal documentation • For humans and machines • Basis for validation—automate! Friday, 7 June 13
  8. • Encöding hell • Schema? Sure, I fax you a

    screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  9. Dupes and other fakes • Use plots to get an

    overview • Watch out for outliers • Try to establish source for errors and fix • Document (in any case) Friday, 7 June 13
  10. • Encöding hell • Schema? Sure, I fax you a

    screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  11. • My data is too big. I can’t check it

    all. • Why don’t you sample, then? Sampling Friday, 7 June 13