OpenStack, Kubernetes and Nextflow

OpenStack, Kubernetes and Nextflow

Technical details of our local Kubernetes setup and how we run Nextflow there

D68d36a42d9c44c29abb391e051e592d?s=128

Vladimir Kiselev

May 17, 2018
Tweet

Transcript

  1. OpenStack, Kubernetes and Nextflow 18/05/18 Vladimir Kiselev Head of Cellular

    Genetics Informatics
  2. Kubernetes Kubernetes is Greek for pilot or helmsman (the person

    holding the ship’s steering wheel). Koo-ber-nay-tace Koo-ber-netties Kubernetes abstracts away the hardware infrastructure and exposes your whole datacenter as a single enormous computational resource.
  3. Kubernetes cluster OpenStack cloud

  4. Kubernetes cluster 1 Pod 5 Pods 2 Pods

  5. Local Kubernetes cluster https://gitlab.internal.sanger.ac.uk/hc7/kubespray ./pre.sh ./terr.sh ./ansible.sh ./kube.sh … ./deleteme.sh

    Helper Instance
  6. Kubernetes cluster ID Flavour VCPUs RAM (Gb) Nodes 8002 o1.medium

    4 4.3 Master 8002 o1.medium 4 4.3 Bastion 8002 o1.medium 4 4.3 glusterFS 500Gb 8002 o1.medium 4 4.3 glusterFS 500Gb 2003 m1.large 8 71.1 Compute node 2003 m1.large 8 71.1 Compute node 2003 m1.large 8 71.1 Compute node 40 230.5 Terraform configuration: cluster_name="vlad-k8s-test" … flavor_k8s_master="8002" flavor_k8s_node="2003" flavor_etcd="8002" flavor_bastion="8002" … number_of_bastions=1 number_of_k8s_masters_no_floating_ip=1 number_of_k8s_nodes_no_floating_ip=3 # GlusterFS variables flavor_gfs_node = "8002" number_of_gfs_nodes_no_floating_ip = "2" gfs_volume_size_in_gb = "500" …
  7. Network topology Kubernetes master Helper Instance Helper Instance

  8. Nextflow on Kubernetes

  9. Nextflow RNAseq pipeline 1. Pull cram files from iRods 2.

    Merge cram files per sample 3. Convert cram to fastq 4. STAR alignment of fastq 5. Count reads samtools STAR index samtools featureCounts
  10. Nextflow RNAseq pipeline 1. Pull cram files from iRods 2.

    Merge cram files per sample 3. Convert cram to fastq 4. STAR alignment of fastq 5. Count reads iRods docker image + Authentication samtools + cram reference file STAR index (S3) samtools featureCounts
  11. iRods docker image + Authentication Kubernetes secret secret.yml: > kubectl

    create -f secret.yml apiVersion: v1 kind: Secret metadata: name: irods-secret type: Opaque data: IRODS_PASSWORD: PASSWORD IRODS_USER_NAME: USERNAME
  12. iRods docker image + Authentication https://github.com/cellgeni/irods docker pull quay.io/cellgeni/irods Put

    iinit.sh in docker image: Run iinit.sh in every iRods pod: COPY ./iinit.sh /iinit.sh echo $(cat /secret/IRODS_PASSWORD) | iinit
  13. iRods docker image + Authentication Nextflow config: process { container

    = ‘quay.io/cellgeni/rnaseq' $irods { container = 'quay.io/cellgeni/irods' pod = [secret: 'irods-secret', mountPath: '/secret'] beforeScript = "/iinit.sh" } }
  14. Software Docker image environment.yml docker pull quay.io/cellgeni/rnaseq name: rnaseq1.5 channels:

    - conda-forge - bioconda dependencies: - fastqc=0.11.7 - bedops=2.4.30 - cutadapt=1.15 - trim-galore=0.4.5 - star=2.5.4a - hisat2=2.1.0 - salmon=0.9.1 - rseqc=2.6.4 - picard=2.17.6 - samtools=1.7 - preseq=2.0.2 - subread=1.6.0 - stringtie=1.3.3 - multiqc=1.4 FROM continuumio/miniconda # Install procps so that Nextflow can poll CPU usage RUN apt-get update && apt-get install -y procps && apt-get clean -y # Update the base version of conda RUN conda update -n base conda COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a ENV PATH /opt/conda/envs/rnaseq1.5/bin:$PATH
  15. STAR index + cram reference files • S3 storage •

    Ideally maintained by NPG > vk6@farm3-head2:~$ printenv | grep REF_PATH REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/ scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://sf2-farm- srv1.internal.sanger.ac.uk::8000/%s
  16. Nextflow k8s issues

  17. Nextflow k8s issues

  18. Next steps • Benchmark Nextflow performance on K8s versus the

    farm (small publication), including wr • Test run of K8s with real users (submission system similar to LSF) • Starting from June: serious focus on CWL, including overview of all current options of workflow engines, finding out the best/light/promising
  19. Acknowledgements Helen Cousins Theo Barber-Bany Paolo Di Tommaso Peter Clapham

    Christopher Harrison Matthew Vernon