to grow in more than X% next quarter? • Is this customer going to leave in the next 5 months? • Classification • Is this a server or a workstation? • What type of server this is? (web server, email, file, proxy, …) • Anomaly detection • Is this spike in uploads normal?
• Static dumps (CSV, TSV, Parquet, …) • Live databases (SQL, NoSQL, …) • Data scrapping (Web) • Database (SQL) • Exposing it to the data science project 29
data from Jira à CSV • Use Python and Pandas to read CSV and clean it up • Use NLP to extract the main info from text • Remove stopwords, punctuaction, etc • Use stemming to get to the root of words • Use XGBclassifier from scikit-learn • Hook Python script to ML algorithm
the few • Save data now for what you might want to ask in the future • Workflows are useful tools in projects • Data science can be hard, so can be some business requirements 55