Utilize Kubernetes on OpenStack

Utilize Kubernetes on OpenStack

D68d36a42d9c44c29abb391e051e592d?s=128

Vladimir Kiselev

November 02, 2018
Tweet

Transcript

  1. Utilize Kubernetes on OpenStack Cellular Genetics Informatics team (Sanger), Gene

    Expression team (EBI)
  2. Deployment Production 4 m1.3xlarge nodes 54 vcpus, 464 Gb RAM

    Staging 4 m1.medium nodes 4 vcpus, 34 Gb RAM Development 4 o1.2xlarge nodes 26 vcpus, 28 Gb RAM utils https://github.com/cellgeni/kubespray
  3. Applications Development: • Pipelines (Nextflow) • Web applications (Docker) One

    liners (using Helm): • Jupyterhub (notebooks) • Jenkins (CI) • Galaxy (EBI pipelines)
  4. Nextflow Worfklow language/engine for data-driven computational pipelines. Java/Groovy • Sun

    Grid Engine (SGE) • AWS Batch • Load Sharing Facility (LSF) • SLURM (a soda in Futurama and an open source Linux job scheduler) • k8s (Kubernetes) • Local (ideal for testing) • PBS/Torque, NQSII, Ignite, GA4GH TES, HTCondor process { input: output: script: }
  5. nextflow kuberun

  6. Nextflow on Kubernetes (vs LSF) nextflow run cellgeni/rnaseq --samplefile $samplefile

    --studyid $sid --genome GRCh38 \ -profile farm3 nextflow run cellgeni/rnaseq --samplefile $samplefile --studyid $sid --genome GRCh38 \ -profile docker -c nf.config
  7. -profile farm3 • executor { • name = 'lsf' •

    queueSize = 400 • perJobMemLimit = true • } • • process { • queue = 'normal' • } •
  8. Shared config on all executors • process { • errorStrategy

    = 'ignore' • maxRetries = 2 • withName: irods { • memory = 2.GB • maxForks = 30 • } • withName: crams_to_fastq { • errorStrategy = { task.exitStatus == 130 ? 'retry' : 'ignore' } • cpus = 4 • memory = { 4.GB + 4.GB * (task.attempt-1) } • } • withName: star { • errorStrategy = { task.exitStatus == 130 ? 'retry' : 'ignore' } • cpus = { 8 * task.attempt, 'cpus' } • memory = { 40.GB * task.attempt * 1.6 ** (task.attempt - 1), 'memory' }
  9. -profile docker • process { • withName: irods { •

    container = 'quay.io/cellgeni/irods' • pod = [secret: 'irods-secret', mountPath: '/secret'] • beforeScript = "/iinit.sh" • cpus = 4 • } • withName: crams_to_fastq { • container = 'quay.io/ biocontainers/samtools:1.8--4' • beforeScript = "export REF_PATH='http:://www.ebi.ac.uk/ena/cram/md5/%s'" • cpus = 4 • } • withName: star { • container = 'quay.io/biocontainers/star:2.5.4a--0' • cpus = 4 • } •
  10. -c nf.config • process { • maxForks = 40 #

    adapt to our cluster size, fake batch system • afterScript = 'sleep 1' # deal with timestamps and caching • cache = 'lenient' # idem • } • • executor { queueSize = 40 } # adapt to cluster size • • k8s { • storageClaimName = 'nf-pvc' # gluster settings • storageMountPath = '/mnt/gluster’ # • } • •
  11. Issues and notes • Samtools failure to populate reference sometimes.

    Still looking for solution, suggestions? • Batch processing; k8s and Nextflow don’t play nicely. The hot potato of ‘pending’? • So far we use GlusterFS rather than S3/NFS/lustre. • Benchmarking just started. Anecdotally: big scope for improvement. [E::cram_get_ref] Failed to populate reference for id 3 [E::cram_decode_slice] Unable to fetch reference #3 53251..1709029 [E::cram_next_slice] Failure to decode slice samtools merge: "23809_5#1.cram" is truncated
  12. Kubernetes vs Farm (RNAseq pipeline) 2 times slower on Kubernetes,

    even with less steps! Farm - run on /lustre Kubernetes - run on GlusterFS
  13. Kubernetes vs Farm (Memory Usage) Kubernetes Farm

  14. Kubernetes vs Farm (Task execution in mins) Kubernetes Farm

  15. Kubernetes vs Farm (Disk I/O - read) Kubernetes Farm

  16. Kubernetes vs Farm (Disk I/O - write) Kubernetes Farm

  17. Kubernetes vs Farm (CPU Usage) STAR is given 8 cpus

    and run with 8 threads STAR is given 4 cpus and run with 4 threads Kubernetes Farm
  18. Kubernetes for web applications

  19. K8s concepts • Pods • Replica sets • Deployments •

    Services • Persistent Volume Claim • Persistent Volume • Storage Class • Ingress controller • Helm
  20. Request routing schema

  21. Deployment flow 1. Dockerize application, upload image 2. Create manifests

    ▪ Deployment ▪ Service ▪ Persistent Volume Claims ▪ Persistent Volumes 3. Create Ingress record 4. Create Nginx record 5. Create Infoblox DNS record 6. [for public access] Web team approval
  22. Example: partslab.sanger.ac.uk • Two independent services: ◦ FORECasT ◦ JACKS

    • FORECasT ◦ Front end ◦ Method back end • JACKS ◦ Front end ◦ Redis ◦ Celery
  23. None
  24. None
  25. Third-party apps: JupyterHub • Interactive analysis with Python/R/Julia • RStudio

    Server on board • Ready installation: zero-to-jupyterhub (helm chart) ◦ helm upgrade --install jpt jupyterhub/jupyterhub --namespace jpt --version 0.7.0-beta.2 --values jupyter-config.yaml • Convenient storage management • Caveats ◦ TLS support didn’t work ◦ Can’t fine-grain resources
  26. Jupyterhub

  27. LIVE DEMO!!!1111

  28. Jupyterhub

  29. Jupyterhub

  30. Jupyterhub

  31. Jupyterhub

  32. Jupyterhub

  33. Jupyterhub

  34. Jupyterhub

  35. Jupyterhub

  36. None
  37. Pros and cons Pros: • Self-healing, rolling upgrades, health checks,

    auto scaling • Cloud provider independent • Resources efficiently utilized • Complex projects are easy to share and get running (via helm charts) • Vast community and knowledge base Cons: • Fresh cluster setup long and expertise demanding // we’ve been there for you, now you can run it on FCE in 20 min: https://github.com/cellgeni/kubespray • Significant learning curve • Limited SSL and Ingress management
  38. Kubernetes future plans • Monitoring: ◦ Elasticsearch/Logstash/Kibana - logs ◦

    Heapster/Grafana/InfluxDB - resources • Nextflow as a service - farm replacement for users • Heketi - better volument management for GlusterFS • GlusterFS alternatives ◦ CephFS ◦ NFS
  39. Paolo Di Tommaso Pablo Moreno Phil Ewels Acknowledgments Helen Cousins

    Theo Barber-Bany Peter Clapham Tim Cutts Joe Garfoot James Smith Paul Bevan Nick Boughton