Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Пилим докер + gradle + apache airflow

Пилим докер + gradle + apache airflow

Андрей Макеев. 6 встреча общества анонимных тестировщиков

Zoya Chizhkova

March 28, 2019
Tweet

More Decks by Zoya Chizhkova

Other Decks in Programming

Transcript

  1. Directed Acyclic Graph Types of actions: - Action - Check

    status - External Trigger - Manual Types of transition: - Straight - Conditional - Skip
  2. Bloody enterprise: - no root - Network segregation - Proxy

    servers - SSL - Only stable software
  3. Docker daemon + Artifactory: Artifactory docker repo (self hosted) Docker

    daemon point registry (root) + Listen tcp:/ /0.0.0.0:2375 also docker pull artifactory.fqdn/proxy-docker.io/hello-world Docker Api v2.0 Dockerfile ARG sudo vi
  4. Airflow Docker image: https:/ /github.com/puckel/docker-airflow - ENV no_proxy=artifactory.fqdn,localhost - ENV

    PIP_CONFIG_FILE=/etc/pip.conf - COPY config/sources.list /etc/apt/sources.list And configure Artifactory to proxy Debian repo - COPY config/pip.conf /etc/pip.conf index-url = ${same_as_index_but_with}/simple index = https:/ /artifactory.fqdn/artifactory/api/… trusted-host = artifactory.fqdn - COPY config/pydistutils.cfg /root/ .pydistutils.cfg
  5. Celery executor: - worker - metadb - rabbitmq Say NO

    to: - static IP - docker legacy links - sleep(10) Scaling out with Celery Say YES to: - builtin DNS - HEALTHCHECK
  6. Network and persistency docker network create \\ -o “com.docker.network.bridge.enable_icc”=“true” \\

    -o “com.docker.metwork.bridge.enable_ip_masquerade”=“true” \\ --attachable airflow-net airflow-net airflow-vol docker volume create airflow-vol $ cat /etc/resolv.conf search big.host.search.domains name server 127.0.0.11 options …
  7. Task types: - Build - Create - List - Remove

    - inspect - start/stop Controls: - If - onError Atomation with gradle https:/ /github.com/bmuschko/gradle-docker-plugin
  8. Atomation with gradle https:/ /github.com/bmuschko/gradle-docker-plugin Multiproject build - settings.gradle -

    Artifactory to proxy gradle plugins - gradle-wrapper.properties distributionUrl - gradle.properties java.net.useSystemProxies=true
  9. if (exception.class.simpleName.matches(‘^NotModifiedException’) { logger.error “Container is already running\n${exception.message}” } else

    { throw exception } task createContainer(type: DockerCreateContainer) { description ‘Create container’ dependsOn buildImage, listimages containerName = docCont portBindings = [’80:8080’] } Atomation with gradle > git add … > git commit -m “ …”
  10. Atomation with gradle ~/gradlew -p ~/pipeline/ :airflow:startContainer ~/gradlew -p ~/pipeline/

    :airflow:removeImage - stop container - remove container - remove image - build tagged image - create container with name - start container
  11. Database container FROM mysql:5.7 COPY initdb.sql /docker-entrypoint-initdb.d/initdb.sql RUN echo “explicit_defaults_for_timestamp

    = 1” >> \ /etc/mysql/mysql.conf.d/mysqld.conf HEALTHCHECK --interval=10s --timeout=5s \ CMD mysql -e ‘select 1 from dual’ > /dev/null || exit 1 database/Dockerfile database/build.gradle task createContainer(type: DockerCreateContainer) { … hostname = ‘airflowdb’ envVars.set([‘MYSQL_ALLOW_EMPTY_PASSWORD’ : ‘yes’]) network = ‘airflow-net’ dns = [‘1.2.3.4’, ‘5.6.7.8’] binds = [“airflow-vol” : “/var/lib/mysql”] … }
  12. rabbitmq container FROM artifactory.fqdn:rabbitmq:3.8-rc-management-alpine HEALTHCHECK --interval=10s —timeout=5s CMD \ …

    docker-library/healthcheck rabbitmqctl eval ' { true, rabbit_app_booted_and_running } = { rabbit:is_booted(node()), rabbit_app_booted_and_running }, { [], no_alarms } = { rabbit:alarms(), no_alarms }, [] /= rabbit_networking:active_listeners(), rabbitmq_node_is_healthy. ' || exit 1 rabbitmq/build.gradle is similar to database task createContainer(type: DockerCreateContainer) { … envVars.set([‘RABBITMQ_DEFAULT_USER’: ‘airflow’, ‘RABBITMQ_DEFAULT_PASS’: ‘airpass’, ‘RABBITMQ_DEFAULT_VHOST’: ‘airflowvh’]) …}
  13. airflow configuration airflow.cfg [core] executor = CeleryExecutor sql_alchemy_conn = mysql://[email protected]/airflow

    fernet_key = aSDFGHJmnbvcxJKjhgfdxsrtHMNBVCDRTYJnbvcxDFG= [celery] worker_concurrency = 4 # I have only one CPU broker_url = pyamqp://airflow:[email protected]/airflowvh result_backend = db+mysql://[email protected]/airflow flower_host = 0.0.0.0 entrypoint.sh case “$1” in webserver) airflow initdb airflow scheduler & airflow flower & airflow worker & exec airflow webserver ;; … AIRFLOW_CELERY_BROCKER_URL=“redis://…”
  14. wait downstream services > Task :database:createContainer Created container with ID

    ‘airflowdb'. > Task :database:startContainer Starting container with ID 'ba661c39ae020f4cb1a1cf9ea2f2d340c65c4e7009d19242f32e3026d4cf7818'. > Task :database:waitUntilHealthy Waiting for container with ID 'airflowdb' to be healthy. Step 1/2 : FROM artifactory.fqdn/proxy-ext-docker.io/rabbitmq:3.8-rc-management-alpine … > Task :rabbitmq:startContainer Starting container with ID '392f657465fe1047448ed8279db22716eebdf4be5790561d54a51edc14968f0f'. > Task :rabbitmq:waitUntilHealthy Waiting for container with ID 'airrabb' to be healthy. > Task :airflow:startContainer Starting container with ID 'c53fa012eddd195dbf8a01019f04522cb22ace7b73b1a005c9e76d6c94b0c17b'. BUILD SUCCESSFUL in 5m 46s $ docker ps --format "{{.ID}}\t{{.Image}}\t\"{{.Status}}\"\t{{.Names}}" |column -t -c 20 392f657465fe airflow-rabbitmq:latest "Up 2 hours (healthy)" airrabb ba661c39ae02 airflowdb:latest "Up 2 hours (healthy)" airflowdb c53fa012eddd airflow:latest "Up 2 hours" airflow-web
  15. embedding gradle airflow worker Dockerfile: - download ${GRADLE_VERSION} - symlink

    bin/gradle ~/bin/gradle (PATH will catchup) - setup keystore with trusted certs for JVM 
 and update gradle.properties to reflect it - init gradle wrapper, add to PATH,
 update gradle-wrapper.properties - checkout plugin with GradleOperator Constraint: - single gradle for all projects
  16. airflow/airflow/operators/bash_operator.py GradlePlugin https:/ /airflow.apache.org/plugins.html#example sub_process = Popen( ['bash', tmp_file.name], stdout=PIPE,

    stderr=STDOUT, cwd=tmp_dir, env=self.env, preexec_fn=pre_exec) sp = Popen( ['/bin/sh', '-c', gradle_command], stdout=PIPE, stderr=STDOUT, cwd=os.path.expanduser('~'), env=self.env, preexec_fn=pre_exec) g_props = list(map(lambda p: "-P" + p, self.gradle_properties)) self.log.info('running gradle_project {} task {}'.format(self.gradle_project, self.gradle_task)) gradle_command = '{0}/gradlew --console=plain --rerun-tasks -p {0}/projects/{1} {2} {3}'.format(os.path.expanduser('~'), self.gradle_project, self.gradle_task, ' '.join(g_props))
  17. Simple DAG from airflow import DAG from airflow.operators.bash_operator import BashOperator

    from airflow.operators.gradle_plugin import GradleOperator from datetime import datetime, timedelta dag = DAG('t0', schedule_interval=None, catchup=False, start_date=datetime(2019, 3, 27), description='gradle task test') t0 = BashOperator( task_id='check_macro', bash_command='echo {{ ds }}’, dag=dag ) t1 = GradleOperator( task_id='run_gradle', # args are hardcoded in plugin yet dag=dag) t1.doc = """ first gradle task to run """
  18. [2019-03-27 18:08:38,914] {{gradle_plugin.py:76}} INFO - Running command: /usr/local/airflow/gradlew --console=plain --rerun-tasks

    -p /usr/local/airflow/projects/ sg_agent buildImage -PSG_VERSION=15.5.2-141 -PENV_TYPE=dev -PINSTANCE_NAME=infra -PACCEPT_ADDRESSES=172.17.0.1 [email protected] -PCONTAINERNAME=agent01 - PdocImg=sg_agent -Pdocker.url='tcp://172.17.0.1:2375' [2019-03-27 18:08:38,932] {{gradle_plugin.py:83}} INFO - Output: [2019-03-27 18:08:41,680] {{gradle_plugin.py:87}} INFO - > Task :readConfiguration UP-TO-DATE [2019-03-27 18:08:42,341] {{logging_mixin.py:95}} INFO - [2019-03-27 18:08:42,340] {{jobs.py:189}} DEBUG - [heartbeat] [2019-03-27 18:08:42,378] {{gradle_plugin.py:87}} INFO - Step 1/22 : FROM artifactory.fqdn/proxy-docker.io/python:2.7-stretch [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - > Task :buildImage [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - Building image using context '/usr/local/airflow/projects/sg_agent'. [2019-03-27 18:08:38,914] {{gradle_plugin.py:76}} INFO - Running command: /usr/local/airflow/gradlew --console=plain --rerun-tasks -p /usr/local/airflow/projects/ sg_agent buildImage -PSG_VERSION=15.5.2-141 -PENV_TYPE=dev -PINSTANCE_NAME=infra -PACCEPT_ADDRESSES=172.17.0.1 [email protected] -PCONTAINERNAME=agent01 - PdocImg=sg_agent -Pdocker.url='tcp://172.17.0.1:2375' [2019-03-27 18:08:38,932] {{gradle_plugin.py:83}} INFO - Output: [2019-03-27 18:08:41,680] {{gradle_plugin.py:87}} INFO - > Task :readConfiguration UP-TO-DATE [2019-03-27 18:08:42,341] {{logging_mixin.py:95}} INFO - [2019-03-27 18:08:42,340] {{jobs.py:189}} DEBUG - [heartbeat] [2019-03-27 18:08:42,378] {{gradle_plugin.py:87}} INFO - Step 1/22 : FROM artifactory.fqdn/proxy-docker.io/python:2.7-stretch [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - > Task :buildImage [2019-03-27 18:08:43,677] {{gradle_plugin.py:87}} INFO - Building image using context '/usr/local/airflow/projects/sg_agent'. Result