Slide 1

Slide 1 text

OpenStack, Kubernetes and Nextflow 18/05/18 Vladimir Kiselev Head of Cellular Genetics Informatics

Slide 2

Slide 2 text

Kubernetes Kubernetes is Greek for pilot or helmsman (the person holding the ship’s steering wheel). Koo-ber-nay-tace Koo-ber-netties Kubernetes abstracts away the hardware infrastructure and exposes your whole datacenter as a single enormous computational resource.

Slide 3

Slide 3 text

Kubernetes cluster OpenStack cloud

Slide 4

Slide 4 text

Kubernetes cluster 1 Pod 5 Pods 2 Pods

Slide 5

Slide 5 text

Local Kubernetes cluster https://gitlab.internal.sanger.ac.uk/hc7/kubespray ./pre.sh ./terr.sh ./ansible.sh ./kube.sh … ./deleteme.sh Helper Instance

Slide 6

Slide 6 text

Kubernetes cluster ID Flavour VCPUs RAM (Gb) Nodes 8002 o1.medium 4 4.3 Master 8002 o1.medium 4 4.3 Bastion 8002 o1.medium 4 4.3 glusterFS 500Gb 8002 o1.medium 4 4.3 glusterFS 500Gb 2003 m1.large 8 71.1 Compute node 2003 m1.large 8 71.1 Compute node 2003 m1.large 8 71.1 Compute node 40 230.5 Terraform configuration: cluster_name="vlad-k8s-test" … flavor_k8s_master="8002" flavor_k8s_node="2003" flavor_etcd="8002" flavor_bastion="8002" … number_of_bastions=1 number_of_k8s_masters_no_floating_ip=1 number_of_k8s_nodes_no_floating_ip=3 # GlusterFS variables flavor_gfs_node = "8002" number_of_gfs_nodes_no_floating_ip = "2" gfs_volume_size_in_gb = "500" …

Slide 7

Slide 7 text

Network topology Kubernetes master Helper Instance Helper Instance

Slide 8

Slide 8 text

Nextflow on Kubernetes

Slide 9

Slide 9 text

Nextflow RNAseq pipeline 1. Pull cram files from iRods 2. Merge cram files per sample 3. Convert cram to fastq 4. STAR alignment of fastq 5. Count reads samtools STAR index samtools featureCounts

Slide 10

Slide 10 text

Nextflow RNAseq pipeline 1. Pull cram files from iRods 2. Merge cram files per sample 3. Convert cram to fastq 4. STAR alignment of fastq 5. Count reads iRods docker image + Authentication samtools + cram reference file STAR index (S3) samtools featureCounts

Slide 11

Slide 11 text

iRods docker image + Authentication Kubernetes secret secret.yml: > kubectl create -f secret.yml apiVersion: v1 kind: Secret metadata: name: irods-secret type: Opaque data: IRODS_PASSWORD: PASSWORD IRODS_USER_NAME: USERNAME

Slide 12

Slide 12 text

iRods docker image + Authentication https://github.com/cellgeni/irods docker pull quay.io/cellgeni/irods Put iinit.sh in docker image: Run iinit.sh in every iRods pod: COPY ./iinit.sh /iinit.sh echo $(cat /secret/IRODS_PASSWORD) | iinit

Slide 13

Slide 13 text

iRods docker image + Authentication Nextflow config: process { container = ‘quay.io/cellgeni/rnaseq' $irods { container = 'quay.io/cellgeni/irods' pod = [secret: 'irods-secret', mountPath: '/secret'] beforeScript = "/iinit.sh" } }

Slide 14

Slide 14 text

Software Docker image environment.yml docker pull quay.io/cellgeni/rnaseq name: rnaseq1.5 channels: - conda-forge - bioconda dependencies: - fastqc=0.11.7 - bedops=2.4.30 - cutadapt=1.15 - trim-galore=0.4.5 - star=2.5.4a - hisat2=2.1.0 - salmon=0.9.1 - rseqc=2.6.4 - picard=2.17.6 - samtools=1.7 - preseq=2.0.2 - subread=1.6.0 - stringtie=1.3.3 - multiqc=1.4 FROM continuumio/miniconda # Install procps so that Nextflow can poll CPU usage RUN apt-get update && apt-get install -y procps && apt-get clean -y # Update the base version of conda RUN conda update -n base conda COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a ENV PATH /opt/conda/envs/rnaseq1.5/bin:$PATH

Slide 15

Slide 15 text

STAR index + cram reference files • S3 storage • Ideally maintained by NPG > vk6@farm3-head2:~$ printenv | grep REF_PATH REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/ scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://sf2-farm- srv1.internal.sanger.ac.uk::8000/%s

Slide 16

Slide 16 text

Nextflow k8s issues

Slide 17

Slide 17 text

Nextflow k8s issues

Slide 18

Slide 18 text

Next steps • Benchmark Nextflow performance on K8s versus the farm (small publication), including wr • Test run of K8s with real users (submission system similar to LSF) • Starting from June: serious focus on CWL, including overview of all current options of workflow engines, finding out the best/light/promising

Slide 19

Slide 19 text

Acknowledgements Helen Cousins Theo Barber-Bany Paolo Di Tommaso Peter Clapham Christopher Harrison Matthew Vernon