as pipeline: with tft_beam.Context( temp_dir="gs://xxxx/xxxx" ): raw_data = (pipeline | "Read" >> beam.io.ReadFromText( "gs://orfeon/inputs/input.csv" , skip_header_lines =1) | "Parse" >> beam.Map(tft.coders.CsvCoder([ "name","num1","num2"], METADATA.schema).decode)) (transformed_data, transformed_metadata), transform_fn = ( (raw_data, METADATA) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)) transformed_data_coder = tft.coders.ExampleProtoCoder(transformed_metadata.schema) (transformed_data | 'EncodeTrainData' >> beam.Map(transformed_data_coder.encode) | 'WriteTrainData' >> beam.io.WriteToTFRecord( "gs://xxxx/output.tfrecord" )) transform_fn | 'WriteTransformFn' >> tft_beam.WriteTransformFn( "gs://xxxx/yyyy" ) Beamの変換関数 をTensorFlowの グラフとして保存 TF-Transformでやりたい処理を記載 入出力はTensor (関数は後述) TF-Transformの再利用する関数 変換と処理を同時に実行