Mixing Windows and Linux containers in large scale probabilistic workflows by Edwin Harmsma

Mixing Windows and Linux containers in large scale probabilistic workflows
Edwin Harmsma ([email protected])

DOCKER IN A NUTSHELL Build to container Write microservice specification
Ship to registry Write container composition specification Pull images from registry Run in environment (i.e. developer notebook or production)

THE HARDEST PART OF AI ISN’T AI “Hidden Technical Debt
in Machine Learning Systems “, Google NIPS 2015

Sensor Technologie toegepast Op Ondergrondse Pijpleidingen

INCIDENT Explosie Haarlemmerhouttuinen (2008) OVV rapport

STOOP is onderdeel van totale asset management overwegingen netbeheerder Grondwaterpeil
Grondzettingen Trillingen Verkeersbelasting Grondvervuiling Aardbeving Anaeroob Overgang slap/stabiel Kromming/buiging Onrondheid Taaiheid Corrosiesnelheid Gaskwaliteit Lekdetectie Faalkans als gevolg van grondzakking

STOOP3: een forse systeem sprong naar inzetbaarheid voor asset management
• 5 meter leiding • Bekende belasting • Bekende grondsamenstelling • Bekende leiding • Omvang gebied: m2 • Omvang gebied: km2 • Onzekerheid in historische belasting • Onzekerheid in bodemprofielen • Onzekerheid in overgangslengte • Onzekerheid in leidinggegevens • Onzekerheid kwantificeren met probabilistische methoden STOOP2: 1:10 (proof of principle) STOOP3: 1:1 (proof of concept)

TNO Monitoring & Control Services * Big data engineering voor
grootschalige monitoring en - rekeninfrastructuren TNO Structural Reliability * Probabilistiek i.c.m. falen van constructies TNO Geologische Dienst van Nederland * Ondergrondmodellering Deltares Software Center * Geotechnische softwaremodellen Deltares Transport Infrastructure * Grond-Leiding interactie Deltares Applied Geology and Geophysics * Ondergrondmodellering SkyGeo * Satelliet maaiveld monitoring met InSAR STOOP3 - MULTIDISCIPLINAIRE AANPAK

Diemen (stedelijke gebied) Woerden (mix stedelijk/landelijk) Krimpenerwaard (landelijke gebied) STOOP3:
PROEFGEBIEDEN

STOOP3: FIELDLAB GRONINGEN

WANNEER BREEKT EEN LEIDING? Grenstoestand R S Z = R
(Resistance) - S (Load) Falen: Z < 0 R S Kans op falen Pf (Z<0) Pf = ?

• Meer nauwkeurigheid = meer trekkingen = meer rekenkracht •
Opschalingsuitdaging om doorlooptijd berekeningen werkbaar te houden Faalgebied leiding ontdekken door veel te rekenen Faalkans functie ‘ontstaat’ door vaak te rekenen 14

VOORBEELD VAN REKENSCENARIO: ZAKKINGSKAART VOOR KRIMPENERWAARD 100 mogelijke ondergrond samenstellingen
uit GeoTop

ZAKKINGSKAART – 101 DSETTLEMENT INSTANTIES Zettingen doorrekenen voor 101 GeoTop
realisaties Rekentijd DSettlement: 3.5 uur totaal 101 VMs met ieder 1 DSettlement service Totaal 252500 DSettlement calculaties Zonder parallelisatie: ~15 dagen In en uitvoer via Azure file storage 3 van de 101 nodes gaven een deployment/provision error docker run --rm ` -h $(HOSTNAME.exe)` -e AZURE_FILE_MOUNT_LETTER=P: ` -e AZURE_FILE_MOUNT_STORAGE_ACCOUNT=tnostoopdrivesa ` -e AZURE_FILE_MOUNT_SHARE_NAME=dsoil ` -e AZURE_FILE_MOUNT_KEY=XXXX ` docker-registry:5000/sensitivity-analysis/settlement-worker:latest ` -realisation $realisationIdx; Settlement-worker Python wrapper DSettlement DLL Shared drive XML doc CSV line 101x Nog zonder probabilistisch rekenenen!

DE OPSCHAALUITDAGING Aanwezigheid van een set van reeds bestaande modellen
Door de tijd heen: bewezen technologie, groot vertrouwen In beoogde systeem: Geheel andere inzet van de modellen Consequentie: Verscheidenheid aan platform eisen, inbreng vanuit meerdere (deel)organisaties Behouden van flexibiliteit in de te ontwikkelende workflows Beschikbare data verschilt per regio, verschillende rekenscenario’s voor meerdere doelen, doorontwikkeling systeem Geschikt voor landelijke uitrol naar de verschillende netbeheerders en waterbedrijven.

DYNAMIC WORKFLOWS FROM A PROVISION PERSPECTIVE Pipelines Load variations 1992
1823 1985 Settlement for every region: - selection of all regions of interest in parallel while not probab.converged() - probabilistic loop for every variation: - calculate stochastic variations in parallel constructRegionLayout() - determine local calculation scenario doSettlementCalculations() - K times doPipelineCalculations() - L times combineResults() - combine settlements and pipeline calculations calculateSegmentStress() - M times, determine pipeline stress per segment hasFailed() - update probabilistic model Ratio between environments is dynamic Conclusion: • Pipeline and soil layout in a region heavily determine number of required Windows and Linux calculations. • Probabilistic convergence is region/case specific for optimized methods like FORM, SORM and DARS. • Component configuration might cause a significant shift in calculation time to a specific platform. E.g. a pipeline resolution parameter might drastically increase Linux-CPU resource consumption.

SCALING TO LARGER AREAS – DISJUNCT ‘TRANSITION’ REGIONS A B
E D C F GeoTOP 100x100m grid GeoTOP transition cells

GRAPHICAL OVERVIEW OF MAIN PROBABILISTIC WORKFLOW Parallelization at several levels:
Per region, per slice, per ground profile Input data and randomly drawn values control the runtime behavior of the workflow. Exact configuration of each model depends on workflow implementation. Interaction between Workflow framework, Linux models and Windows models happen within ‘center of the probabilistic loop’. A full distributed setup is required. Loop per region

PETS CATTLE Source: “CERN Data Centre Evolution” presentation by Gavin
McCance ▪ Given names like frontend.xyz.org ▪ Unique, lovingly raised and cared for ▪ When they get ill, you nurse them back to health ▪ Given numbers like sparkworker-32 ▪ Almost identical to other cattle ▪ When they get ill, you get another one

TYPICAL DEPLOYMENT WORKFLOW Implement scenario Deploy virtual machine cluster Calculate
scenario job Delete machines and related (temporary) resources Provision virtual machines Gather results Submit job

TECHNOLOGISCHE KEUZES Cloud infrastructuur: Microsoft Azure Hybride Windows+Linux én 32bit+64bit
Rekeninfrastructuur configureerbaar Hoeveel machines met welke eigenschappen en functionaliteit? Virtualisatie technieken Virtuele servers Containerisatie middels Docker Gedistributeerd en parallel rekenen Apache Spark Code beheer, deployment, testing GitLab – Continuous Integration

WAAR IS DE CONTAINER ORCHESTRATOR? Swarm Cattle Mesos Kubernetes Geen
enkele orchestrator biedt op dit moment ondersteuning voor zowel Windows als Linux containers in hetzelfde cluster

DEPLOYMENT IN DETAIL Master prob2b wtube Spark worker Windows partner
dsettlement dgeopipeline Worker pair prob2b wtube Spark worker Windows partner dsettlement dgeopipeline Worker pair ... N times Setup: Single master, spark standalone mode Every node ‘normal’ VM with Docker Every subsystem in separate containers Rationale: scalable but still flexible during development.

REKENCLUSTER: CONFIGURATIE I.P.V. INSTALLATIE New-AzureRmResourceGroupDeployment ` -Name $DeploymentID ` -ResourceGroupName
$ResourceGroupName ` -NamePrefix $HostNamePrefix ` -NumWorkers $Count ` -TemplateFile 'azuredeploy.json' ` -TemplateParameterFile 'azuredeploy.parameters.json' ` -Mode $Mode Hoeveel rekennodes en met welke eigenschappen? Waar moet het draaien? Inhouse Hybride cloud/in house

AZURE DEPLOYMENT ./deploy-and-provision.sh ScenarioB timetest 2 3 parallel deployments: 
1x Linux master  Nx Linux worker  Nx Windows worker Number of workers Storage in temporary storage account Linux virtual disks Windows virtual disks Provision scripts Docker ready hybrid cluster in 10-15 minutes

AUTOMATISCH ‘OPRUIMEN’ REKENCLUSTER

DOCKERFILE EXAMPLE OF WINDOWS MODEL WORKER

UNEXPECTED CHANGE IN PROVISION TIME … caused by required image-pull
(and decompress) For development and testing: Rebuild your base images (regulary)! For production: align azure-image version with your windowsservercore Docker (base) images

USING WINDOWS CONTAINER FROM A DISTRIBUTED SPARK APPLICATION An element-wise
‘pipe’, as distributed as a ‘normal’ Spark map-operation Workflow code is very flexible: Separation of concerns between model designer and workflow engineer Both Windows and Linux models can be used in the same distributed workflow Distributed operations

TNO == RESEARCH

DYNAMIC AND ENVIRONMENT AWARE PROVISIONING Multi-environment clouds Operating systems Hardware
differences (CPU architecture, GPU, etc) Calculation platforms (Apache Spark, Hadoop, Storm, etc) Ad-Hoc need for computation Dynamic calculation scenarios, i.e. input determines the scenario to be calculated Flexible workflow design Context aware auto-scaling and auto-provisioning

More information about Dynamic Provisioning of hybrid container clusters in
master thesis presentation @RuG by Maarten Kollenstart ADAPTIVE WINDOWS / LINUX RATIO - RESULTS

TNO ♡ NOORD Vestiging in Groningen: Eemsgolaan 3 (langs A7)
80 medewerkers, o.a. IT expertise's Monitoring & Control Services en Cyber Security & Robustness Actief op Entrance terein (Groningen Zernike complex) Hybride Energie Systeem Integratie (HESI) Lab STOOP Fieldlab 5G lab Typische ‘noordelijke’ projecten: Smart Dairy Farming (o.a. samen met Dairycampus Leeuwarden) NAM monitoring network – Groningse aardbevings

THANKS FOR YOUR ATTENTION! tno.nl/careers Meer informatie over het STOOP
project: http://stoopnetbeheer.nl

Mixing Windows and Linux containers in large sc...

Mixing Windows and Linux containers in large scale probabilistic workflows by Edwin Harmsma

More Decks by devNetNoord

Other Decks in Technology

Featured

Transcript