PyData London 2017 – Efficient and portable DataFrame storage with Apache Parquet

PyData London 2017 – Efficient and portable DataFrame storage with Apache Parquet

Apache Parquet is the most used columnar data format in the big data processing space and recently gained Pandas support. It leverages various techniques to store data in a CPU and I/O efficient way and provides capabilities to push-down queries to the I/O layer. In this talk, it is shown how to use it in Python, detail its structure and present the portable usage with other tools.

D6fcc16462fbe93673342da3ff5d8121?s=128

Uwe L. Korn

May 07, 2017
Tweet