Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data Breaking Bad
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Michael Hausenblas
June 03, 2013
Technology
220
1
Share
Data Breaking Bad
Open Stage talk at Berlin Buzzwords 2013
Michael Hausenblas
June 03, 2013
More Decks by Michael Hausenblas
See All by Michael Hausenblas
KubeCologne keynote—Troubleshooting Kubernetes apps
mhausenblas
4
8.3k
Extending Kubernetes 101
mhausenblas
4
2.4k
Kubernetes and serverless technologies for high-performance applications
mhausenblas
1
390
Troubleshooting Kubernetes Applications
mhausenblas
1
650
Autoscaling All Things Kubernetes with Prometheus
mhausenblas
0
990
Three Billy Goats Gruff : from a monolith to containers to functions
mhausenblas
0
650
Bending Kubernetes to Your Needs
mhausenblas
2
3k
Kubernetes Security: from Image Hygiene to Network Policies
mhausenblas
8
4.1k
Hands-on Cloud Native Lifecycle Management
mhausenblas
3
510
Other Decks in Technology
See All in Technology
Ruby::Boxでできること、Refinementsでできること
joker1007
3
380
大学生が本気でDatabricksを活用してDiscordサークルをデータ駆動させてみた
phantomjuju
1
380
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
1.7k
JEP 522 Deep Dive - G1 GC同期コスト削減によるスループット向上を徹底検証&解説
tabatad
1
740
noUncheckedIndexedAccess、3時間、1万円。 / noUncheckedIndexedAccess, 3 Hours, 10,000 JPY.
kaonavi
1
260
Platform Engineering as a Product: Criteria for Improvement and Multi-Tenant Design
kumorn5s
0
490
AIプラットフォームを運用し続けるための可観測性
tanimuyk
4
1.1k
DevOps Agentで始めるAWS運用 〜フロンティアエージェントが変える運用の現場〜
nyankotaro
1
150
GoとSIMDとWasmの今。
askua
3
490
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
AI活用を推進するために ファインディが下した、一つの小さな決断
starfish719
0
240
Javaコミュニティをもっと楽しむための9箇条
takasyou
0
1.2k
Featured
See All Featured
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
320
Paper Plane
katiecoart
PRO
1
51k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
210
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
720
How to Think Like a Performance Engineer
csswizardry
28
2.6k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3.3k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.5k
How to Talk to Developers About Accessibility
jct
2
220
Deep Space Network (abreviated)
tonyrice
0
160
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
830
Transcript
Da Michael Hausenblas, MapR Technologies Berlin Buzzwords 2013, Open Stage
Talk Friday, 7 June 13
Nope. Not this one. Friday, 7 June 13
Friday, 7 June 13
things you can influence things that affect you try and
focus on this stuff Friday, 7 June 13
The awkward moment when I open the data I got
from a customer Friday, 7 June 13
http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/ aka crap in, crap out Friday, 7 June 13
Some examples … Friday, 7 June 13
• Encöding hell • Schema? Sure, I fax you a
screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
Encöding hell application-specific encodings • URL encoding • HTML encoding
• Database escaping non-ASCII? a%20percent-encoded%20string%20as%20of%20RFC%203986 a <strong>HTML</strong> encoded string Friday, 7 June 13
• Use Unicode • Use Unicode • Use Unicode Encöding
hell http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/ Friday, 7 June 13
• Encöding hell • Schema? Sure, I fax you a
screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
Schema? Sure, I fax you a screenshot Friday, 7 June
13
Schema? Sure, I fax you a screenshot • There is
a need for proper, formal documentation • For humans and machines • Basis for validation—automate! Friday, 7 June 13
• Encöding hell • Schema? Sure, I fax you a
screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
Dupes and other fakes Friday, 7 June 13
Dupes and other fakes Friday, 7 June 13
Dupes and other fakes • Use plots to get an
overview • Watch out for outliers • Try to establish source for errors and fix • Document (in any case) Friday, 7 June 13
• Encöding hell • Schema? Sure, I fax you a
screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
• My data is too big. I can’t check it
all. • Why don’t you sample, then? Sampling Friday, 7 June 13
http://mortardata.com/ Friday, 7 June 13
Friday, 7 June 13
Go and buy this book. Now. Friday, 7 June 13