Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mixing Windows and Linux containers in large sc...

devNetNoord
September 20, 2017

Mixing Windows and Linux containers in large scale probabilistic workflows by Edwin Harmsma

devCampNoord #02

devNetNoord

September 20, 2017
Tweet

More Decks by devNetNoord

Other Decks in Technology

Transcript

  1. AND

  2. DOCKER IN A NUTSHELL Build to container Write microservice specification

    Ship to registry Write container composition specification Pull images from registry Run in environment (i.e. developer notebook or production)
  3. THE HARDEST PART OF AI ISN’T AI “Hidden Technical Debt

    in Machine Learning Systems “, Google NIPS 2015
  4. STOOP is onderdeel van totale asset management overwegingen netbeheerder Grondwaterpeil

    Grondzettingen Trillingen Verkeersbelasting Grondvervuiling Aardbeving Anaeroob Overgang slap/stabiel Kromming/buiging Onrondheid Taaiheid Corrosiesnelheid Gaskwaliteit Lekdetectie Faalkans als gevolg van grondzakking
  5. STOOP3: een forse systeem sprong naar inzetbaarheid voor asset management

    • 5 meter leiding • Bekende belasting • Bekende grondsamenstelling • Bekende leiding • Omvang gebied: m2 • Omvang gebied: km2 • Onzekerheid in historische belasting • Onzekerheid in bodemprofielen • Onzekerheid in overgangslengte • Onzekerheid in leidinggegevens • Onzekerheid kwantificeren met probabilistische methoden STOOP2: 1:10 (proof of principle) STOOP3: 1:1 (proof of concept)
  6. TNO Monitoring & Control Services * Big data engineering voor

    grootschalige monitoring en - rekeninfrastructuren TNO Structural Reliability * Probabilistiek i.c.m. falen van constructies TNO Geologische Dienst van Nederland * Ondergrondmodellering Deltares Software Center * Geotechnische softwaremodellen Deltares Transport Infrastructure * Grond-Leiding interactie Deltares Applied Geology and Geophysics * Ondergrondmodellering SkyGeo * Satelliet maaiveld monitoring met InSAR STOOP3 - MULTIDISCIPLINAIRE AANPAK
  7. WANNEER BREEKT EEN LEIDING? Grenstoestand R S Z = R

    (Resistance) - S (Load) Falen: Z < 0 R S Kans op falen Pf (Z<0) Pf = ?
  8. • Meer nauwkeurigheid = meer trekkingen = meer rekenkracht •

    Opschalingsuitdaging om doorlooptijd berekeningen werkbaar te houden Faalgebied leiding ontdekken door veel te rekenen Faalkans functie ‘ontstaat’ door vaak te rekenen 14
  9. ZAKKINGSKAART – 101 DSETTLEMENT INSTANTIES Zettingen doorrekenen voor 101 GeoTop

    realisaties Rekentijd DSettlement: 3.5 uur totaal 101 VMs met ieder 1 DSettlement service Totaal 252500 DSettlement calculaties Zonder parallelisatie: ~15 dagen In en uitvoer via Azure file storage 3 van de 101 nodes gaven een deployment/provision error docker run --rm ` -h $(HOSTNAME.exe)` -e AZURE_FILE_MOUNT_LETTER=P: ` -e AZURE_FILE_MOUNT_STORAGE_ACCOUNT=tnostoopdrivesa ` -e AZURE_FILE_MOUNT_SHARE_NAME=dsoil ` -e AZURE_FILE_MOUNT_KEY=XXXX ` docker-registry:5000/sensitivity-analysis/settlement-worker:latest ` -realisation $realisationIdx; Settlement-worker Python wrapper DSettlement DLL Shared drive XML doc CSV line 101x Nog zonder probabilistisch rekenenen!
  10. DE OPSCHAALUITDAGING Aanwezigheid van een set van reeds bestaande modellen

    Door de tijd heen: bewezen technologie, groot vertrouwen In beoogde systeem: Geheel andere inzet van de modellen Consequentie: Verscheidenheid aan platform eisen, inbreng vanuit meerdere (deel)organisaties Behouden van flexibiliteit in de te ontwikkelende workflows Beschikbare data verschilt per regio, verschillende rekenscenario’s voor meerdere doelen, doorontwikkeling systeem Geschikt voor landelijke uitrol naar de verschillende netbeheerders en waterbedrijven.
  11. DYNAMIC WORKFLOWS FROM A PROVISION PERSPECTIVE Pipelines Load variations 1992

    1823 1985 Settlement for every region: - selection of all regions of interest in parallel while not probab.converged() - probabilistic loop for every variation: - calculate stochastic variations in parallel constructRegionLayout() - determine local calculation scenario doSettlementCalculations() - K times doPipelineCalculations() - L times combineResults() - combine settlements and pipeline calculations calculateSegmentStress() - M times, determine pipeline stress per segment hasFailed() - update probabilistic model Ratio between environments is dynamic Conclusion: • Pipeline and soil layout in a region heavily determine number of required Windows and Linux calculations. • Probabilistic convergence is region/case specific for optimized methods like FORM, SORM and DARS. • Component configuration might cause a significant shift in calculation time to a specific platform. E.g. a pipeline resolution parameter might drastically increase Linux-CPU resource consumption.
  12. SCALING TO LARGER AREAS – DISJUNCT ‘TRANSITION’ REGIONS A B

    E D C F GeoTOP 100x100m grid GeoTOP transition cells
  13. GRAPHICAL OVERVIEW OF MAIN PROBABILISTIC WORKFLOW Parallelization at several levels:

    Per region, per slice, per ground profile Input data and randomly drawn values control the runtime behavior of the workflow. Exact configuration of each model depends on workflow implementation. Interaction between Workflow framework, Linux models and Windows models happen within ‘center of the probabilistic loop’. A full distributed setup is required. Loop per region
  14. PETS CATTLE Source: “CERN Data Centre Evolution” presentation by Gavin

    McCance ▪ Given names like frontend.xyz.org ▪ Unique, lovingly raised and cared for ▪ When they get ill, you nurse them back to health ▪ Given numbers like sparkworker-32 ▪ Almost identical to other cattle ▪ When they get ill, you get another one
  15. TYPICAL DEPLOYMENT WORKFLOW Implement scenario Deploy virtual machine cluster Calculate

    scenario job Delete machines and related (temporary) resources Provision virtual machines Gather results Submit job
  16. TECHNOLOGISCHE KEUZES Cloud infrastructuur: Microsoft Azure Hybride Windows+Linux én 32bit+64bit

    Rekeninfrastructuur configureerbaar Hoeveel machines met welke eigenschappen en functionaliteit? Virtualisatie technieken Virtuele servers Containerisatie middels Docker Gedistributeerd en parallel rekenen Apache Spark Code beheer, deployment, testing GitLab – Continuous Integration
  17. WAAR IS DE CONTAINER ORCHESTRATOR? Swarm Cattle Mesos Kubernetes Geen

    enkele orchestrator biedt op dit moment ondersteuning voor zowel Windows als Linux containers in hetzelfde cluster
  18. DEPLOYMENT IN DETAIL Master prob2b wtube Spark worker Windows partner

    dsettlement dgeopipeline Worker pair prob2b wtube Spark worker Windows partner dsettlement dgeopipeline Worker pair ... N times Setup: Single master, spark standalone mode Every node ‘normal’ VM with Docker Every subsystem in separate containers Rationale: scalable but still flexible during development.
  19. REKENCLUSTER: CONFIGURATIE I.P.V. INSTALLATIE New-AzureRmResourceGroupDeployment ` -Name $DeploymentID ` -ResourceGroupName

    $ResourceGroupName ` -NamePrefix $HostNamePrefix ` -NumWorkers $Count ` -TemplateFile 'azuredeploy.json' ` -TemplateParameterFile 'azuredeploy.parameters.json' ` -Mode $Mode Hoeveel rekennodes en met welke eigenschappen? Waar moet het draaien? Inhouse Hybride cloud/in house
  20. AZURE DEPLOYMENT ./deploy-and-provision.sh ScenarioB timetest 2 3 parallel deployments: 

    1x Linux master  Nx Linux worker  Nx Windows worker Number of workers Storage in temporary storage account Linux virtual disks Windows virtual disks Provision scripts Docker ready hybrid cluster in 10-15 minutes
  21. UNEXPECTED CHANGE IN PROVISION TIME … caused by required image-pull

    (and decompress) For development and testing: Rebuild your base images (regulary)! For production: align azure-image version with your windowsservercore Docker (base) images
  22. USING WINDOWS CONTAINER FROM A DISTRIBUTED SPARK APPLICATION An element-wise

    ‘pipe’, as distributed as a ‘normal’ Spark map-operation Workflow code is very flexible: Separation of concerns between model designer and workflow engineer Both Windows and Linux models can be used in the same distributed workflow Distributed operations
  23. DYNAMIC AND ENVIRONMENT AWARE PROVISIONING Multi-environment clouds Operating systems Hardware

    differences (CPU architecture, GPU, etc) Calculation platforms (Apache Spark, Hadoop, Storm, etc) Ad-Hoc need for computation Dynamic calculation scenarios, i.e. input determines the scenario to be calculated Flexible workflow design Context aware auto-scaling and auto-provisioning
  24. More information about Dynamic Provisioning of hybrid container clusters in

    master thesis presentation @RuG by Maarten Kollenstart ADAPTIVE WINDOWS / LINUX RATIO - RESULTS
  25. TNO ♡ NOORD Vestiging in Groningen: Eemsgolaan 3 (langs A7)

    80 medewerkers, o.a. IT expertise's Monitoring & Control Services en Cyber Security & Robustness Actief op Entrance terein (Groningen Zernike complex) Hybride Energie Systeem Integratie (HESI) Lab STOOP Fieldlab 5G lab Typische ‘noordelijke’ projecten: Smart Dairy Farming (o.a. samen met Dairycampus Leeuwarden) NAM monitoring network – Groningse aardbevings