The wrong way to start your machine learning project is to “chuck everything into a model to see what happens”. The better way is to visualise your data to expose the relationships that you see, to confirm that your data looks good and to identify problems that are likely to make your life difficult. You’ll save time, you’ll understand “why” your data works and you’ll uncover problems sooner.
We’ll review ways to quickly and visually diagnose your data, to check it meets your assumptions and to prepare it for discussion with your colleagues. We’ll look at tools including Pandas, Seaborn and Pandas Profiling. At the end you’ll have new tools to help you confidently investigate new data with your associates.
This talk introduces Ian’s new discover_feature_relationships tool which will save you time during your Exploratory Data Analysis phase.
http://budapestbiforum.hu/2018/hu/eloadasok/on-the-diagramatic-diagnosis-of-data-ian-ozsvald-mor-consulting-ltd/