Data Cleaning on text to prepare for analysis and machine learning @ EuroSciPy 2015

3d644406158b4d440111903db1f62622?s=47 ianozsvald
August 28, 2015

Data Cleaning on text to prepare for analysis and machine learning @ EuroSciPy 2015

Dirty data makes analysis and machine learning harder (or impossible!) and more prone to failure. I'll talk on the techniques we use at ModelInsight to fix badly encoded, inconsistent and hard-to-parse text data that enable us to prepare real-world industrial data for research.

Topics will include text cleaning through normalisation and similarity measures, date parsing, data joining and visualisation. This talk is aimed at helping you make rapid progress on new projects.

Conference link:
https://www.euroscipy.org/2015/schedule/presentation/4/
Write-up:
http://ianozsvald.com/2015/08/28/euroscipy-2015-and-data-cleaning-on-text-for-ml-talk/

3d644406158b4d440111903db1f62622?s=128

ianozsvald

August 28, 2015
Tweet