In reality
• Volume grows increasingly
• Real life environnement always complicated
• Privacy, compliance, etc
• ETL is a pain, not always feasible
• Data is always messy, incoherent ,incomplete
• E.g Date:
“Sat Mar 1 10:12:53 PST,”
“ 2014-03-01 18:12:53 +00:00”
“1393697578”