Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenStack, Kubernetes and Nextflow

OpenStack, Kubernetes and Nextflow

Technical details of our local Kubernetes setup and how we run Nextflow there

Vladimir Kiselev

May 17, 2018
Tweet

More Decks by Vladimir Kiselev

Other Decks in Technology

Transcript

  1. Kubernetes Kubernetes is Greek for pilot or helmsman (the person

    holding the ship’s steering wheel). Koo-ber-nay-tace Koo-ber-netties Kubernetes abstracts away the hardware infrastructure and exposes your whole datacenter as a single enormous computational resource.
  2. Kubernetes cluster ID Flavour VCPUs RAM (Gb) Nodes 8002 o1.medium

    4 4.3 Master 8002 o1.medium 4 4.3 Bastion 8002 o1.medium 4 4.3 glusterFS 500Gb 8002 o1.medium 4 4.3 glusterFS 500Gb 2003 m1.large 8 71.1 Compute node 2003 m1.large 8 71.1 Compute node 2003 m1.large 8 71.1 Compute node 40 230.5 Terraform configuration: cluster_name="vlad-k8s-test" … flavor_k8s_master="8002" flavor_k8s_node="2003" flavor_etcd="8002" flavor_bastion="8002" … number_of_bastions=1 number_of_k8s_masters_no_floating_ip=1 number_of_k8s_nodes_no_floating_ip=3 # GlusterFS variables flavor_gfs_node = "8002" number_of_gfs_nodes_no_floating_ip = "2" gfs_volume_size_in_gb = "500" …
  3. Nextflow RNAseq pipeline 1. Pull cram files from iRods 2.

    Merge cram files per sample 3. Convert cram to fastq 4. STAR alignment of fastq 5. Count reads samtools STAR index samtools featureCounts
  4. Nextflow RNAseq pipeline 1. Pull cram files from iRods 2.

    Merge cram files per sample 3. Convert cram to fastq 4. STAR alignment of fastq 5. Count reads iRods docker image + Authentication samtools + cram reference file STAR index (S3) samtools featureCounts
  5. iRods docker image + Authentication Kubernetes secret secret.yml: > kubectl

    create -f secret.yml apiVersion: v1 kind: Secret metadata: name: irods-secret type: Opaque data: IRODS_PASSWORD: PASSWORD IRODS_USER_NAME: USERNAME
  6. iRods docker image + Authentication https://github.com/cellgeni/irods docker pull quay.io/cellgeni/irods Put

    iinit.sh in docker image: Run iinit.sh in every iRods pod: COPY ./iinit.sh /iinit.sh echo $(cat /secret/IRODS_PASSWORD) | iinit
  7. iRods docker image + Authentication Nextflow config: process { container

    = ‘quay.io/cellgeni/rnaseq' $irods { container = 'quay.io/cellgeni/irods' pod = [secret: 'irods-secret', mountPath: '/secret'] beforeScript = "/iinit.sh" } }
  8. Software Docker image environment.yml docker pull quay.io/cellgeni/rnaseq name: rnaseq1.5 channels:

    - conda-forge - bioconda dependencies: - fastqc=0.11.7 - bedops=2.4.30 - cutadapt=1.15 - trim-galore=0.4.5 - star=2.5.4a - hisat2=2.1.0 - salmon=0.9.1 - rseqc=2.6.4 - picard=2.17.6 - samtools=1.7 - preseq=2.0.2 - subread=1.6.0 - stringtie=1.3.3 - multiqc=1.4 FROM continuumio/miniconda # Install procps so that Nextflow can poll CPU usage RUN apt-get update && apt-get install -y procps && apt-get clean -y # Update the base version of conda RUN conda update -n base conda COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a ENV PATH /opt/conda/envs/rnaseq1.5/bin:$PATH
  9. STAR index + cram reference files • S3 storage •

    Ideally maintained by NPG > vk6@farm3-head2:~$ printenv | grep REF_PATH REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/ scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://sf2-farm- srv1.internal.sanger.ac.uk::8000/%s
  10. Next steps • Benchmark Nextflow performance on K8s versus the

    farm (small publication), including wr • Test run of K8s with real users (submission system similar to LSF) • Starting from June: serious focus on CWL, including overview of all current options of workflow engines, finding out the best/light/promising