Slide 7
Slide 7 text
7
● A data lake is a storage repository that holds a vast amount of raw data in its
native format until it is needed. - http://searchaws.techtarget.com
● A data lake is a storage repository that holds a vast amount of raw data in its
native format, including structured, semi-structured, and unstructured data.
The data structure and requirements are not defined until the data is needed.
- Tamara Dull, (SAS), https://www.kdnuggets.com
● It store the data in its native/ raw format
● The schema applied when on query time
● Sometimes it’s also just a “marketing label” to simplified people saying the
technology which complied with Hadoop, just like “big data” terms for
distributed storing and query engine
Data Lake by Definitions