Slide 17
Slide 17 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald
PyConIreland October 2014
The cost of poor quality data
●
Current project – 9 months invested cleaning company
names
●
Chief Data Scientists cite as significant expense
●
On-going 'below the surface' costs with adding dirty data,
maintaining data integrity, keeping pipeline consistent
●
Do a Data Audit to understand what you have
●
We need more data cleaning tools and better integration
to non-Python systems
●
We can only do clever things if we have clean data
●
Garbage in, garbage out...