Slide 6
Slide 6 text
Data Preparation
• Often performed by a data engineer but can also be done by the data scientist.
• Cleanse data of null and bad values.
• Data may be aggregated (counts, sums, averages, etc.) and reformatted or calculated
columns may be added.
• Data can come from many sources and formats such as SQL Server, Cosmos DB, Flat
Files, etc.
• Data must be merged from the sources into a consistent and useful dataset.