Slide 14
Slide 14 text
14
Apache Arrow
• Specification for in-memory columnar data layout
• No overhead for cross-system communication
• Designed for efficiency (exploit SIMD, cache locality, ..)
• Exchange data without conversion between Python, C++, C(glib),
Ruby, Lua, R, JavaScript, Go, Rust, Matlab and the JVM
• Brought Parquet to Pandas and made PySpark fast (@pandas_udf)