data • Current project – 9 months invested cleaning company names • Chief Data Scientists cite as significant expense • On-going 'below the surface' costs with adding dirty data, maintaining data integrity, keeping pipeline consistent • Do a Data Audit to understand what you have • We need more data cleaning tools and better integration to non-Python systems • We can only do clever things if we have clean data • Garbage in, garbage out...