Slide 1

Slide 1 text

Jonathan Stott [email protected]

Slide 2

Slide 2 text

PRODUCTION? https://commons.wikimedia.org/wiki/File:Jetstar%27s_first_787_on_the_production_line_(9132370198).jpg

Slide 3

Slide 3 text

PRE-FLIGHT CHECKS ?

Slide 4

Slide 4 text

PREPARING FOR TAKE OFF: COMMAND LINE ARGUMENTS $ Rscript my_script.R input.csv output.rds [1] “input.csv” [1] “INFO” # cmd_example.R # shows how to use command line arguments doc <- "Usage: cmd_example.R [options] [--] Options: -h, --help this help -l LEVEL, --log LEVEL logging level [default: INFO] " args <- docopt::docopt(doc) print(args$input_file) print(args$log) # ...

Slide 5

Slide 5 text

PREPARE FOR DISASTER … or errors at least Fred Larson, Far Side

Slide 6

Slide 6 text

LOGS: YOUR FLIGHT RECORDER

Slide 7

Slide 7 text

MAPS: WHERE DOES YOUR DATA LIVE? Amazon S3

Slide 8

Slide 8 text

SHIPPING CONTAINERS FROM rocker/r-ver:latest # install a package RUN install2.r tidyverse # copy code in COPY . /app # set working directory (important for R scripts!) WORKDIR /app

Slide 9

Slide 9 text

SCHEDULED FLIGHTS Photo by Amin Salehi on Unsplash

Slide 10

Slide 10 text

AIR TRAFFIC CONTROL WITH AIRFLOW $ pip install apache-airflow

Slide 11

Slide 11 text

ROUTES…

Slide 12

Slide 12 text

LAYOUT OF THE CONTROL TOWER

Slide 13

Slide 13 text

DIFFERENT PLANES (OR OPERATORS) FOR DIFFERENT JOBS Amazon S3

Slide 14

Slide 14 text

WATCHING OUT FOR INCOMING DATA Photo by Nicolas Prieto on Unsplash

Slide 15

Slide 15 text

BUILDING OUR OWN class ROperator(BaseOperator): template_fields = ('script', 'arguments', 'cwd') @apply_defaults def __init__(self, script, arguments = [], cwd = None, **kwargs): self.script = script self.arguments = arguments self.cwd = cwd super(ROperator, self).__init__(**kwargs) def execute(self, context): r_proc = subprocess.Popen( ['/usr/bin/Rscript', self.script] + self.arguments, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, cwd=self.cwd) r_stdoutdata, _ = r_proc.communicate() self.log.info("Rscript output %s", escape_string(r_stdoutdata)) if r_proc.returncode != 0: raise AirflowException("Rscript {} failed with return code {}".format(self.script, r_proc.returncode)) return escape_string(r_stdoutdata)

Slide 16

Slide 16 text

BUILDING OUR OWN class ROperator(BaseOperator): template_fields = ('script', 'arguments', 'cwd') @apply_defaults def __init__(self, script, arguments = [], cwd = None, **kwargs): self.script = script self.arguments = arguments self.cwd = cwd super(ROperator, self).__init__(**kwargs) def execute(self, context): r_proc = subprocess.Popen( ['/usr/bin/Rscript', self.script] + self.arguments, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, cwd=self.cwd) r_stdoutdata, _ = r_proc.communicate() self.log.info("Rscript output %s", escape_string(r_stdoutdata)) if r_proc.returncode != 0: raise AirflowException("Rscript {} failed with return code {}".format(self.script, r_proc.returncode)) return escape_string(r_stdoutdata)

Slide 17

Slide 17 text

BUILDING OUR OWN class ROperator(BaseOperator): template_fields = ('script', 'arguments', 'cwd') @apply_defaults def __init__(self, script, arguments = [], cwd = None, **kwargs): self.script = script self.arguments = arguments self.cwd = cwd super(ROperator, self).__init__(**kwargs) def execute(self, context): r_proc = subprocess.Popen( ['/usr/bin/Rscript', self.script] + self.arguments, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, cwd=self.cwd) r_stdoutdata, _ = r_proc.communicate() self.log.info("Rscript output %s", escape_string(r_stdoutdata)) if r_proc.returncode != 0: raise AirflowException("Rscript {} failed with return code {}".format(self.script, r_proc.returncode)) return escape_string(r_stdoutdata)

Slide 18

Slide 18 text

BUILDING OUR OWN class ROperator(BaseOperator): template_fields = ('script', 'arguments', 'cwd') @apply_defaults def __init__(self, script, arguments=[], cwd=None, **kwargs): self.script = script self.arguments = arguments self.cwd = cwd super(ROperator, self).__init__(**kwargs) def execute(self, context): r_proc = subprocess.Popen( ['/usr/bin/Rscript', self.script] + self.arguments, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, cwd=self.cwd) r_stdoutdata, _ = r_proc.communicate() self.log.info("Rscript output %s", escape_string(r_stdoutdata)) if r_proc.returncode != 0: raise AirflowException("Rscript {} failed with return code {}".format(self.script, r_proc.returncode)) return escape_string(r_stdoutdata)

Slide 19

Slide 19 text

BUILDING OUR OWN import subprocess import re from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults from airflow.exceptions import AirflowException def escape_string(string): """escape the weird characters R sometimes outputs""" string = repr(string) # easy escape string = re.sub(r"\\n", "\n", string) # unescape new lines string = re.sub(r"\\t", "\t", string) return string class ROperator(BaseOperator): # class definition ...

Slide 20

Slide 20 text

OTHER FEATURES IN THE CONTROL TOWER

Slide 21

Slide 21 text

COMPLICATED ROUTES …

Slide 22

Slide 22 text

EXPLORING FURTHER Jenkins AWS / Azure Batch

Slide 23

Slide 23 text

LANDING Converting a well architected R project into production is not that hard Apache Airflow can help you to manage complex workflows There’s a lot more out there too

Slide 24

Slide 24 text

THANKS! [email protected] https://pixabay.com/en/night-flight-plane-airport-2307018/ namelessjon namelessjon https://github.com/namelessjon/Preparing-your-model-for-takeoff https://speakerdeck.com/namelessjon/preparing-your-model-for-takeoff