Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Пилим докер + gradle + apache airflow

Пилим докер + gradle + apache airflow

Андрей Макеев. 6 встреча общества анонимных тестировщиков

Avatar for Zoya Chizhkova

Zoya Chizhkova

March 28, 2019
Tweet

More Decks by Zoya Chizhkova

Other Decks in Programming

Transcript

  1. Directed Acyclic Graph Types of actions: - Action - Check

    status - External Trigger - Manual Types of transition: - Straight - Conditional - Skip
  2. Bloody enterprise: - no root - Network segregation - Proxy

    servers - SSL - Only stable software
  3. Docker daemon + Artifactory: Artifactory docker repo (self hosted) Docker

    daemon point registry (root) + Listen tcp:/ /0.0.0.0:2375 also docker pull artifactory.fqdn/proxy-docker.io/hello-world Docker Api v2.0 Dockerfile ARG sudo vi
  4. Airflow Docker image: https:/ /github.com/puckel/docker-airflow - ENV no_proxy=artifactory.fqdn,localhost - ENV

    PIP_CONFIG_FILE=/etc/pip.conf - COPY config/sources.list /etc/apt/sources.list And configure Artifactory to proxy Debian repo - COPY config/pip.conf /etc/pip.conf index-url = ${same_as_index_but_with}/simple index = https:/ /artifactory.fqdn/artifactory/api/… trusted-host = artifactory.fqdn - COPY config/pydistutils.cfg /root/ .pydistutils.cfg
  5. Celery executor: - worker - metadb - rabbitmq Say NO

    to: - static IP - docker legacy links - sleep(10) Scaling out with Celery Say YES to: - builtin DNS - HEALTHCHECK
  6. Network and persistency docker network create \\ -o “com.docker.network.bridge.enable_icc”=“true” \\

    -o “com.docker.metwork.bridge.enable_ip_masquerade”=“true” \\ --attachable airflow-net airflow-net airflow-vol docker volume create airflow-vol $ cat /etc/resolv.conf search big.host.search.domains name server 127.0.0.11 options …
  7. Task types: - Build - Create - List - Remove

    - inspect - start/stop Controls: - If - onError Atomation with gradle https:/ /github.com/bmuschko/gradle-docker-plugin
  8. Atomation with gradle https:/ /github.com/bmuschko/gradle-docker-plugin Multiproject build - settings.gradle -

    Artifactory to proxy gradle plugins - gradle-wrapper.properties distributionUrl - gradle.properties java.net.useSystemProxies=true
  9. if (exception.class.simpleName.matches(‘^NotModifiedException’) { logger.error “Container is already running\n${exception.message}” } else

    { throw exception } task createContainer(type: DockerCreateContainer) { description ‘Create container’ dependsOn buildImage, listimages containerName = docCont portBindings = [’80:8080’] } Atomation with gradle > git add … > git commit -m “ …”
  10. Atomation with gradle ~/gradlew -p ~/pipeline/ :airflow:startContainer ~/gradlew -p ~/pipeline/

    :airflow:removeImage - stop container - remove container - remove image - build tagged image - create container with name - start container
  11. Database container FROM mysql:5.7 COPY initdb.sql /docker-entrypoint-initdb.d/initdb.sql RUN echo “explicit_defaults_for_timestamp

    = 1” >> \ /etc/mysql/mysql.conf.d/mysqld.conf HEALTHCHECK --interval=10s --timeout=5s \ CMD mysql -e ‘select 1 from dual’ > /dev/null || exit 1 database/Dockerfile database/build.gradle task createContainer(type: DockerCreateContainer) { … hostname = ‘airflowdb’ envVars.set([‘MYSQL_ALLOW_EMPTY_PASSWORD’ : ‘yes’]) network = ‘airflow-net’ dns = [‘1.2.3.4’, ‘5.6.7.8’] binds = [“airflow-vol” : “/var/lib/mysql”] … }
  12. rabbitmq container FROM artifactory.fqdn:rabbitmq:3.8-rc-management-alpine HEALTHCHECK --interval=10s —timeout=5s CMD \ …

    docker-library/healthcheck rabbitmqctl eval ' { true, rabbit_app_booted_and_running } = { rabbit:is_booted(node()), rabbit_app_booted_and_running }, { [], no_alarms } = { rabbit:alarms(), no_alarms }, [] /= rabbit_networking:active_listeners(), rabbitmq_node_is_healthy. ' || exit 1 rabbitmq/build.gradle is similar to database task createContainer(type: DockerCreateContainer) { … envVars.set([‘RABBITMQ_DEFAULT_USER’: ‘airflow’, ‘RABBITMQ_DEFAULT_PASS’: ‘airpass’, ‘RABBITMQ_DEFAULT_VHOST’: ‘airflowvh’]) …}
  13. airflow configuration airflow.cfg [core] executor = CeleryExecutor sql_alchemy_conn = mysql://[email protected]/airflow

    fernet_key = aSDFGHJmnbvcxJKjhgfdxsrtHMNBVCDRTYJnbvcxDFG= [celery] worker_concurrency = 4 # I have only one CPU broker_url = pyamqp://airflow:[email protected]/airflowvh result_backend = db+mysql://[email protected]/airflow flower_host = 0.0.0.0 entrypoint.sh case “$1” in webserver) airflow initdb airflow scheduler & airflow flower & airflow worker & exec airflow webserver ;; … AIRFLOW_CELERY_BROCKER_URL=“redis://…”
  14. wait downstream services > Task :database:createContainer Created container with ID

    ‘airflowdb'. > Task :database:startContainer Starting container with ID 'ba661c39ae020f4cb1a1cf9ea2f2d340c65c4e7009d19242f32e3026d4cf7818'. > Task :database:waitUntilHealthy Waiting for container with ID 'airflowdb' to be healthy. Step 1/2 : FROM artifactory.fqdn/proxy-ext-docker.io/rabbitmq:3.8-rc-management-alpine … > Task :rabbitmq:startContainer Starting container with ID '392f657465fe1047448ed8279db22716eebdf4be5790561d54a51edc14968f0f'. > Task :rabbitmq:waitUntilHealthy Waiting for container with ID 'airrabb' to be healthy. > Task :airflow:startContainer Starting container with ID 'c53fa012eddd195dbf8a01019f04522cb22ace7b73b1a005c9e76d6c94b0c17b'. BUILD SUCCESSFUL in 5m 46s $ docker ps --format "{{.ID}}\t{{.Image}}\t\"{{.Status}}\"\t{{.Names}}" |column -t -c 20 392f657465fe airflow-rabbitmq:latest "Up 2 hours (healthy)" airrabb ba661c39ae02 airflowdb:latest "Up 2 hours (healthy)" airflowdb c53fa012eddd airflow:latest "Up 2 hours" airflow-web
  15. embedding gradle airflow worker Dockerfile: - download ${GRADLE_VERSION} - symlink

    bin/gradle ~/bin/gradle (PATH will catchup) - setup keystore with trusted certs for JVM 
 and update gradle.properties to reflect it - init gradle wrapper, add to PATH,
 update gradle-wrapper.properties - checkout plugin with GradleOperator Constraint: - single gradle for all projects
  16. airflow/airflow/operators/bash_operator.py GradlePlugin https:/ /airflow.apache.org/plugins.html#example sub_process = Popen( ['bash', tmp_file.name], stdout=PIPE,

    stderr=STDOUT, cwd=tmp_dir, env=self.env, preexec_fn=pre_exec) sp = Popen( ['/bin/sh', '-c', gradle_command], stdout=PIPE, stderr=STDOUT, cwd=os.path.expanduser('~'), env=self.env, preexec_fn=pre_exec) g_props = list(map(lambda p: "-P" + p, self.gradle_properties)) self.log.info('running gradle_project {} task {}'.format(self.gradle_project, self.gradle_task)) gradle_command = '{0}/gradlew --console=plain --rerun-tasks -p {0}/projects/{1} {2} {3}'.format(os.path.expanduser('~'), self.gradle_project, self.gradle_task, ' '.join(g_props))
  17. Simple DAG from airflow import DAG from airflow.operators.bash_operator import BashOperator

    from airflow.operators.gradle_plugin import GradleOperator from datetime import datetime, timedelta dag = DAG('t0', schedule_interval=None, catchup=False, start_date=datetime(2019, 3, 27), description='gradle task test') t0 = BashOperator( task_id='check_macro', bash_command='echo {{ ds }}’, dag=dag ) t1 = GradleOperator( task_id='run_gradle', # args are hardcoded in plugin yet dag=dag) t1.doc = """ first gradle task to run """
  18. [2019-03-27 18:08:38,914] {{gradle_plugin.py:76}} INFO - Running command: /usr/local/airflow/gradlew --console=plain --rerun-tasks

    -p /usr/local/airflow/projects/ sg_agent buildImage -PSG_VERSION=15.5.2-141 -PENV_TYPE=dev -PINSTANCE_NAME=infra -PACCEPT_ADDRESSES=172.17.0.1 [email protected] -PCONTAINERNAME=agent01 - PdocImg=sg_agent -Pdocker.url='tcp://172.17.0.1:2375' [2019-03-27 18:08:38,932] {{gradle_plugin.py:83}} INFO - Output: [2019-03-27 18:08:41,680] {{gradle_plugin.py:87}} INFO - > Task :readConfiguration UP-TO-DATE [2019-03-27 18:08:42,341] {{logging_mixin.py:95}} INFO - [2019-03-27 18:08:42,340] {{jobs.py:189}} DEBUG - [heartbeat] [2019-03-27 18:08:42,378] {{gradle_plugin.py:87}} INFO - Step 1/22 : FROM artifactory.fqdn/proxy-docker.io/python:2.7-stretch [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - > Task :buildImage [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - Building image using context '/usr/local/airflow/projects/sg_agent'. [2019-03-27 18:08:38,914] {{gradle_plugin.py:76}} INFO - Running command: /usr/local/airflow/gradlew --console=plain --rerun-tasks -p /usr/local/airflow/projects/ sg_agent buildImage -PSG_VERSION=15.5.2-141 -PENV_TYPE=dev -PINSTANCE_NAME=infra -PACCEPT_ADDRESSES=172.17.0.1 [email protected] -PCONTAINERNAME=agent01 - PdocImg=sg_agent -Pdocker.url='tcp://172.17.0.1:2375' [2019-03-27 18:08:38,932] {{gradle_plugin.py:83}} INFO - Output: [2019-03-27 18:08:41,680] {{gradle_plugin.py:87}} INFO - > Task :readConfiguration UP-TO-DATE [2019-03-27 18:08:42,341] {{logging_mixin.py:95}} INFO - [2019-03-27 18:08:42,340] {{jobs.py:189}} DEBUG - [heartbeat] [2019-03-27 18:08:42,378] {{gradle_plugin.py:87}} INFO - Step 1/22 : FROM artifactory.fqdn/proxy-docker.io/python:2.7-stretch [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - > Task :buildImage [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - Building image using context '/usr/local/airflow/projects/sg_agent'. Result