Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy

Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy

Apache Arrow's promise was to reduce the (serialization & copy) overhead of working with columnar data between different systems. Using the latest Pandas release and Arrow's ability to share memory between the JVM and Python as ingredients, we demonstrate that Arrow can fulfill this bold statement. The performance benefits of this will be shown using a typical data engineering use-case that produces data in the JVM and then passes it on to a Python-based machine learning model.

D6fcc16462fbe93673342da3ff5d8121?s=128

Uwe L. Korn

October 25, 2018
Tweet