Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalable JupyterHub deployments for Education, Research, Analytics, and Unicorns

Scalable JupyterHub deployments for Education, Research, Analytics, and Unicorns

Part 2 of a day-long tutorial on deploying the open-source scientific stack for education, research, and data analytics using JupyterHub. This part focuses on Zero to JupyterHub, a Kubernetes-based deployment of JupyterHub that can scale to tens of thousands of users.

Chris Holdgraf

July 22, 2018
Tweet

More Decks by Chris Holdgraf

Other Decks in Technology

Transcript

  1. 1. Setting up a bare-bones JupyterHub on Kubernetes (using Google

    Cloud) 2. Customizing your JupyterHub 3. Introduction to Binder/BinderHub 4. Deploying a BinderHub 5. Tearing everything down
  2. 1. Learning about Google Cloud. 2. Setting up a simple

    JupyterHub deployment 3. Understanding the basics of Kubernetes and Helm
  3. 1. Becoming Kubernetes experts 2. Diving deep into how containers

    work 3. Full reference of various JupyterHub, Kubernetes or Helm features
  4. • Works well for large (several thousand active users) &

    small (10-50 users) installations • Doesn’t need constant human operator intervention • More cost effective at scale: Increase or decrease cloud resources automatically or manually
  5. • Abstracts away most detail of underlying cloud providers /

    hardware • Declarative high level primitives that allow you to be as high level or low level as needed • Utilize features of underlying hardware when you want (GPUs, SSDs, etc) easily
  6. • Not controlled by one single commercial entity • Fast

    paced releases that keep backwards compatibility • Has worked to foster a warm, welcoming environment for contributors and users
  7. Google Cloud AWS Azure Your private cloud Your own hardware

    Kubernetes JupyterHub #1 Other software JupyterHub #2 Jupyter Notebook Servers, one per user
  8. Google Cloud AWS Azure Your private cloud Your own hardware

    Kubernetes JupyterHub #1 Other software JupyterHub #2 This tutorial uses Google Cloud
  9. 300$ of free credits! • Go to console.cloud.google.com • Click

    ‘Sign up for Free Trial’ • Select ‘Individual’ account type, follow instructions • Unfortunately you do need a credit card, but it won’t be charged if this is the first time you set it up
  10. $ kubectl get node NAME STATUS AGE VERSION gke-test-default-pool-792c3248-4hhn Ready

    4m v1.7.3 gke-test-default-pool-792c3248-5bhm Ready 4m v1.7.3 gke-test-default-pool-792c3248-gqpc Ready 4m v1.7.3 verb object
  11. $ kubectl get node NAME STATUS AGE VERSION gke-test-default-pool-792c3248-4hhn Ready

    4m v1.7.3 gke-test-default-pool-792c3248-5bhm Ready 4m v1.7.3 gke-test-default-pool-792c3248-gqpc Ready 4m v1.7.3 random
  12. • Define, Install & Upgrade Applications (“Charts”) that can run

    on top of Kubernetes • Capture entire state of any installation with one (or more) YAML files • Not just JupyterHub specific. Tons of applications packaged this way Package Manager / App store for Kubernetes
  13. $ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash Downloading https://kubernetes-helm.storage.googleapis.com/helm-v2.6.0-linux-amd64.tar.gz Preparing to install

    into /usr/local/bin helm installed into /usr/local/bin/helm Run 'helm init' to configure helm. If this bothers you, go to github.com/kubernetes/helm and download latest release :)
  14. kubectl --namespace kube-system create serviceaccount tiller kubectl create clusterrolebinding tiller

    \ --clusterrole cluster-admin \ --serviceaccount=kube-system:tiller
  15. $ helm init --service-account tiller $HELM_HOME has been configured at

    /home/<yourusername>/.helm. Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster. Happy Helming!
  16. kubectl --namespace=kube-system patch deployment tiller-deploy \ --type=json --patch='[{"op": "add", "path":

    "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]' https://engineering.bitnami.com/articles/helm-security.html
  17. $ helm repo add jupyterhub \ https://jupyterhub.github.io/helm-chart/ "jupyterhub" has been

    added to your repositories $ helm repo update Hang tight while we grab the latest from your chart repositories... ...Skip local chart repository ...Successfully got an update from the "jupyterhub" chart repository ...Successfully got an update from the "stable" chart repository Update Complete. ⎈ Happy Helming!⎈
  18. • Helm uses the JupyterHub Helm Chart to set up

    the machinery specified in it. • It uses the configuration file we’ve provided to customize • It pulls the Docker image needed to run JupyterHub + sets up the resources needed for it
  19. $ kubectl --namespace=jhub get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)

    AGE hub ClusterIP 10.3.243.80 <none> 8081/TCP 1m proxy-api ClusterIP 10.3.252.84 <none> 8001/TCP 1m proxy-http ClusterIP 10.3.249.23 <none> 8000/TCP 1m proxy-public LoadBalancer 10.3.252.210 35.193.125.233 80:30905/TCP,443:31486/TCP 1m
  20. Proxy proxy-<hash> Hub hub-<hash> Authenticate user Kubernetes Cluster VOLUME PROVIDE

    / POD CREATE / USER REDIRECT JupyterHub Architecture (high-level details) SIGNED OUT USER REDIRECT ROUTE INFO SEND Data and I/O User flow Users Pods + Volumes jupyter-<username>-<hash> IMAGE PULL / USER SESSION This user’s pod Disk Volumes Provides persistent storage Image Registry Provides environment images CULL PODS IF STALE Trigger action
  21. Proxy proxy-<hash> Hub hub-<hash> Authenticate user Kubernetes Cluster JupyterHub Architecture

    (high-level details) SIGNED IN USER REDIRECT Data and I/O User flow Users Pods + Volumes jupyter-<username>-<hash> IMAGE PULL / USER SESSION This user’s pod Disk Volumes Provides persistent storage Image Registry Provides environment images CULL PODS IF STALE Trigger action
  22. $ kubectl --namespace=jhub get pod NAME READY STATUS RESTARTS AGE

    hub-deployment-3839270210-bnhk6 1/1 Running 0 4m jupyter-yuvipanda 1/1 Running 0 26s proxy-deployment-1227971824-30z3w 1/1 Running 0 4m verb object
  23. $ kubectl --namespace=jhub get deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE

    AGE hub-deployment 1 1 1 1 2h proxy-deployment 1 1 1 1 2h verb object
  24. $ kubectl --namespace=jhub get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE

    hub 10.103.244.159 <none> 8081/TCP 5m proxy-api 10.103.250.100 <none> 8001/TCP 5m proxy-public 10.103.242.55 104.154.134.167 80/TCP 5m
  25. 1. What is a helm chart? 2. Changing your JupyterHub’s

    configuration 3. Resizing your cluster 4. Using different docker images 5. Debugging scenarios
  26. config.yaml Helm Chart Deployments Services Access Control etc Templates Dependencies

    (other charts) Chart metadata helm Kubernetes Cluster tiller Deployments Services Access Control Namespace A Namespace B Namespace C
  27. Create config.yaml helm install Appropriate kubernetes objects created config.yaml interpolated

    into templates from chart $ helm install \ jupyterhub/jupyterhub \ --version=v0.6 \ --name=jhub \ --namespace=jhub \ -f config.yaml
  28. kind: Secret apiVersion: v1 metadata: name: hub-secret type: Opaque data:

    proxy.token: {{ .Values.proxy.secretToken | quote }} proxy: secretToken: 0c811306d7f18a9b747fd2ef40c725f11fb7091e409fc2e6421c400e kind: Secret apiVersion: v1 metadata: name: hub-secret type: Opaque data: proxy.token: “0c811306d7f18a9b747fd2ef40c725f11fb7091e409fc2e6421c400e” Template config.yaml Kubernetes Object
  29. $ helm upgrade jhub jupyterhub/jupyterhub \ --version=v0.6 -f config.yaml Release

    "jhub" has been upgraded. Happy Helming! LAST DEPLOYED: Sat Oct 7 18:21:41 2017 NAMESPACE: jhub STATUS: DEPLOYED RESOURCES: ...
  30. Modify config.yaml helm upgrade Config.yaml interpolated into templates from chart

    Existing kubernetes objects retrieved Existing kubernetes objects are reconciled with interpolated objects Minimal amount of changes required is applied - if nothing has changed, this is a noop (no operation)! Only pods that require restarting are restarted If you keep your config.yaml file in a git repository, this makes CI / CD very easy!
  31. singleuser: memory: limit: 2G guarantee: 1G $ helm upgrade jhub

    jupyterhub/jupyterhub \ --version=0.6 -f config.yaml
  32. gcloud container clusters resize workshop-cluster --size 4 --zone us-central1-b Pool

    [default-pool] for [workshop-cluster] will be resized to 4. Do you want to continue (Y/n)? Y Resizing workshop-cluster...done. Updated [https://container.googleapis.com/v1/projects/sandbox-182119/zones/us-central1-b/clust ers/workshop-cluster].
  33. You’re probably wondering to yourself “how much money did I

    just commit to spending?” The answer to this depends on the type of machine, number of machines, and cloud provider. For more information, see zero-to-jupyterhub.readthedocs.io/en/latest/cost.html
  34. • Investigate the state of our pods • Go over

    the Helm Chart configuration • Re-deploy to fix problems
  35. kubectl get Return names / states of various components. E.g.:

    $ kubectl --namespace=jhub get pod NAME READY STATUS RESTARTS AGE hub-deployment-2442265953-d9hl8 1/1 Running 0 5h proxy-deployment-1227971824-zfkmh 1/1 Running 0 22m
  36. kubectl describe Return more complete information about a specific component.

    $ kubectl --namespace=jhub describe pod <pod-name> Name: hub-deployment-2442265953-d9hl8 Namespace: jhub Node: gke-jhub-default-pool-51a2c7ec-xkjg/10.128.0.2 Start Time: Sat, 07 Oct 2017 18:21:53 -0700 ...
  37. kubectl logs Return logging output for a component $ kubectl

    --namespace=jhub logs <pod-name> [I 2017-10-08 01:23:10.948 JupyterHub app:720] Loading cookie_secret from env[JPY_COOKIE_SECRET] [W 2017-10-08 01:23:11.040 JupyterHub app:864] No admin users, admin interface will be unavailable. [W 2017-10-08 01:23:11.040 JupyterHub app:865] Add any administrative users to `c.Authenticator.admin_users` in config. [I 2017-10-08 01:23:11.040 JupyterHub app:892] Not using whitelist. Any authenticated user will be allowed.
  38. • Platform as a Service for Educational and Research Initiatives

    for Atmospheric Sciences • Workshops, Classes • Approaches • Manual (VM - bootstrap K8) and templated via Magnum • Additional presentations on this case will be presented during poster reception (Tuesday 6:30 - 8:30pm) and XSEDE Fellows Plenary (Thursday 10:30am - 11:15am, Grand Ballroom 3) Copy of the Poster
  39. $ git clone https://github.com/etiennedub/terraform-binderhub $ cd terraform-binderhub $ cat README.md

    ## Terraform deployment 1. In this repository, create a new folder and go into : `mkdir my_cluster; cd my_cluster`. 2. Copy the corresponding cloud provider `.tf` file from the `examples/providers` directory : `cp ../examples/providers/my_provider.tf .` 3. Copy the corresponding DNS provider `.tf` file from the `examples/dns` directory : `cp ../examples/dns/my_dns.tf .` 4. Adapt the cluster variables in both `.tf` files (i.e.: # nodes, domain name, ssh key, etc). 5. Apply your credentials for the cloud and the DNS provider. 6. Set your username to be accessible in Terraform: `export TF_VAR_username=$OS_USERNAME` 7. Initiate the Terraform state : `terraform init`. 8. Verify the Terraform plan : `terraform plan`. 9. Apply the Terraform plan : `terraform apply`.
  40. $ . TG-CDA170005-openrc.sh Please enter your OpenStack Password for project

    TG-CDA170005 as user tg458632: $ export TF_VAR_username=$OS_USERNAME $ ssh-keygen -t rsa -f key
  41. BinderHub binder-<hash> Kubernetes Cluster Repo Provider BinderHub Architecture (build process)

    repo2docker build-<hash> IMAGE BUILD REPO PULL LAUNCH BUILD IF REPO HASH IS DIFFERENT Users Data and I/O User flow Trigger action Image Registry Provides environment images REPO INFO SEND WEBSITE SERVE IMAGE PUSH
  42. Kubernetes Cluster POD CREATE BinderHub Architecture (requesting launch) Users Data

    and I/O User flow Trigger action Image Registry Provides environment images JupyterHub hub-<hash> proxy-<hash> Pods + Volumes jupyter-<username>-<hash> IMAGE PULL / USER SESSION BinderHub binder-<hash> USER CREATE
  43. Kubernetes Cluster BinderHub Architecture (running user) USER REDIRECT Users Data

    and I/O User flow Trigger action Image Registry Provides environment images BinderHub binder-<hash> Pods + Volumes jupyter-<username>-<hash> IMAGE PULL / USER SESSION
  44. BinderHub binder-<hash> Kubernetes Cluster Repo Provider POD CREATE / USER

    REDIRECT Pods jupyter-<reponame>-<hash> IMAGE PULL / USER SESSION BinderHub Architecture (high-level details) Build Pod build-<hash> IMAGE BUILD USER REDIRECT REPO PULL LAUNCH BUILD IF REPO HASH IS DIFFERENT Users Data and I/O User flow Trigger action JupyterHub hub-<hash> proxy-<hash> Image Registry Provides environment images REPO INFO SEND IMAGE REGISTER CULL PODS IF STALE WEBSITE SERVE
  45. Run openssl rand -hex 32 again, then add... jupyterhub: hub:

    services: binder: apiToken: "<output of FIRST `openssl rand -hex 32` command>" proxy: secretToken: "<output of SECOND `openssl rand -hex 32` command>"
  46. Watch out for tab characters! registry: # below is the

    content of the JSON file downloaded earlier for the container registry from Service Accounts # it will look something like the following (with actual values instead of empty strings) # paste the content after `password: |` below password: | { "type": "<REPLACE>", "project_id": "<REPLACE>", "private_key_id": "<REPLACE>", "private_key": "<REPLACE>", "client_email": "<REPLACE>", "client_id": "<REPLACE>", "auth_uri": "<REPLACE>", "token_uri": "<REPLACE>", "auth_provider_x509_cert_url": "<REPLACE>", "client_x509_cert_url": "<REPLACE>" }
  47. kubectl --namespace=bhub get svc proxy-public NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)

    AGE proxy-public LoadBalancer 10.3.253.188 35.192.208.83 80:31970/TCP 22h Copy the External IP
  48. helm delete <YOUR-HELM-RELEASE-NAME> --purge kubectl delete ns <your-namespace> Deletes all

    kubernetes objects in the deployment, including automatically created disks gcloud container clusters delete workshop-cluster --zone=us-central1-b Deletes the whole cluster, including all nodes Then we can delete the project!
  49. paws.wmflabs.org Access Wikimedia Data & Run bots Share your notebooks

    with the world 2.8 million edits to Wikimedia Projects
  50. z2jh.jupyter.org • Learn more about kubernetes at k8s.io • Deploy

    to your own hardware, AWS, or Azure • Build your own Docker Image • Set up automatic deployments from GitHub • Integrate GPUs, Spark, TensorFlow, etc • Your cool idea!
  51. OSF.io & Binder mutually-aware See also Tuesday 1:30pm PEARC workshop:

    Library and Research Computing Efforts and Tools to Improve Data Sharing and Archiving
  52. Jupyter for Shared Infra Mailing List Notebook resource usage widget

    GitHub repo for terraform + kubernetes + binderhub