| beam.io.ReadTextFile('/path/to/files') words = lines | beam.FlatMap(lambda line: re.findall('\w+', line)) The "pipe" operator applies a transformation (on the right) to a PCollection, reminiscent of bash. This will be applied to each line, resulting in a PCollection of words.