Big data processing with Apache Beam

Big data processing with Apache Beam

In this talk, we present the new Python SDK for Apache Beam - a parallel programming model that allows one to implement batch and streaming data processing jobs that can run on a variety of execution engines like Apache Spark and Google Cloud Dataflow. We will use examples to discuss some of the interesting challenges in providing a Pythonic API and execution environment for distributed processing.



July 06, 2017