Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RackHD Debugging and Isolation

RackHD Debugging and Isolation

Overview of RackHD for the purposes of assisting with debugging and isolation of unexpected results...

Joseph Heck

April 26, 2016
Tweet

More Decks by Joseph Heck

Other Decks in Technology

Transcript

  1. RackHD
    Debugging and Isola3on

    View Slide

  2. Taking Data to Informa3on
    •  Analysis is cri3cal to resolving defects
    •  So=ware QA can go much deeper
    – All the code is available to inves3gate
    – Architecture is open
    – Responsibili3es for components well defined
    •  A roadmap to learn never hurts…
    And that is what this deck includes...

    View Slide

  3. Isola3on
    •  Break down the problem into smaller parts
    •  Look for the natural boundaries in the system
    •  Implies you need to know how it is built, how
    it interacts
    – Reasoning about components
    •  But what if you don’t know?
    – Or can’t remember?

    View Slide

  4. Scien3fic Method
    •  How to learn what it really does
    –  The docs lie, programmers make mistakes, and there’s
    always unintended consequences
    •  Scien3fic Method Process
    –  Ask a ques3on
    –  Do the background research
    –  Construct a hypothesis
    –  Test you hypothesis
    –  Analyze your data, draw a conclusion
    –  Communicate your results

    View Slide

  5. Exis3ng tests
    •  Integra3on Tests
    –  hRps://github.com/RackHD/RackHD/tree/master/test
    •  Unit tests for smaller so=ware components
    –  hRps://coveralls.io/github/RackHD/on-core
    –  hRps://coveralls.io/github/RackHD/on-dhcp-proxy
    –  hRps://coveralls.io/github/RackHD/on-hRp
    –  hRps://coveralls.io/github/RackHD/on-syslog
    –  hRps://coveralls.io/github/RackHD/on-taskgraph
    –  hRps://coveralls.io/github/RackHD/on-tasks
    –  hRps://coveralls.io/github/RackHD/on-Wtp
    •  hRp://rackhd.readthedocs.org/en/latest/repositories.html#repositories-status

    View Slide

  6. The Background Research
    •  RackHD logical architecture
    •  Core Concepts
    – PXE Boo3ng – how it works
    – Workflow Engine Interac3ons
    – PXE boo3ng a Microkernel to extend reach
    – Profiles and Templates
    •  Configura3on and Logs

    View Slide

  7. Process/Communica3ons Architecture
    hRp://rackhd.readthedocs.org/en/latest/so=ware_architecture.html#major-components

    View Slide

  8. Logical Architecture
    on-syslog on-Wtp ISC dhcp
    on-dhcp-proxy
    on-hRp
    rabbitmq
    mongodb
    on-taskgraph
    SNMP IPMI AMT REDFISH … clients
    Incoming
    Events
    Workflow Engine
    Outgoing
    Ac3ons

    View Slide

  9. Configura3on and Logs
    on-*
    process
    hRps://github.com/RackHD/on-core/blob/master/lib/services/configura3on.js
    hRps://github.com/RackHD/on-core/blob/fea46c/lib/common/messenger.js#L85
    Configura3on by key – Order of Precedence:
    •  Command-line argument
    •  Environment Variable
    •  Configura3on File
    •  /opt/monorail/config.json
    •  /opt/onrack/etc/monorail.json
    •  In-code defaults
    •  uri = configura3on.get(‘amqp’, ‘amqp://localhost’)
    •  Distributed in code where used/needed
    stdout
    upstart
    docker
    systemD
    /var/log/upstart/on-*
    docker logs {docker_id}
    journctl

    View Slide

  10. Basic DHCP, no Proxy
    from hRp://download.intel.com/design/archives/wfm/downloads/pxespec.pdf

    View Slide

  11. DHCP w/ local Proxy
    from hRp://download.intel.com/design/archives/wfm/downloads/pxespec.pdf

    View Slide

  12. DHCP w/ remote Proxy
    from hRp://download.intel.com/design/archives/wfm/downloads/pxespec.pdf

    View Slide

  13. PXE (addi3onal reading)
    RackHD overview descrip3on
    •  hRp://rackhd.readthedocs.org/en/latest/how_it_works.html
    PXE: what it is, how it works
    •  hRps://en.m.wikipedia.org/wiki/Preboot_Execu3on_Environment
    The PXE Spec:
    •  hRp://download.intel.com/design/archives/wfm/downloads/pxespec.pdf
    DHCP
    •  hRps://en.wikipedia.org/wiki/Dynamic_Host_Configura3on_Protocol
    DHCP Proxy
    •  hRp://www.juniper.net/documenta3on/en_US/junos13.3/topics/concept/dhcp-
    extended-dhcp-relay-proxy-overview.html

    View Slide

  14. iPXE follow on
    iPXE request for a script “profiles” API
    Response:
    •  Don’t know the node: Discover it
    •  Known node, no Workflow: No-op or
    default response
    •  Known node, workflow: response
    from workflow
    client
    system
    RackHD
    hRp://rackhd.readthedocs.org/en/latest/devguide/index.html#rackhd-debugging-guide

    View Slide

  15. What is a workflow
    Graph
    Task
    Task Job
    Job
    Task Task Job
    Job
    Task Job
    Graph
    •  JSON document
    •  Describes flow of execu3on
    •  Wrapper for Shared op3ons and context values
    Task
    •  JSON Data only
    •  1:1 ra3o of tasks to jobs
    •  Can have 0-n tasks as run dependencies in a graph
    •  Target nodes or arbitrary code execu3on
    Job
    •  NodeJS code backing the Task declara3on
    •  Simply a class with a run func3on
    •  Configura3on comes from Task JSON

    View Slide

  16. Task Flow
    Example w/
    Failure Handling
    Task-B
    Success
    Task-D
    Task-C
    Task-A
    succeeded
    finished
    failed

    View Slide

  17. Workflow Tasks
    •  Run commands, tooling
    –  IPMI
    –  SNMP
    –  RACADM
    •  Interact with RackHD data
    –  Read catalog data
    –  Set catalog values, node values
    •  Provide responses for PXE
    –  DHCP
    –  TFTP
    –  HTTP

    View Slide

  18. Profiles and Templates
    workflow
    HTTP
    GET /api/1.1/profiles
    GET /api/1.1/templates/{id}
    hRps://github.com/RackHD/on-hRp/blob/master/lib/api/1.1/southbound/profiles.js
    hRps://github.com/RackHD/on-hRp/blob/master/lib/api/1.1/southbound/templates.js
    iPXE
    bootloader
    Any
    “southward”
    ini3ated
    HTTP request
    Profile == iPXE Script
    Template == Any generalized template
    hRps://github.com/RackHD/on-hRp/tree/master/data/profiles
    hRps://github.com/RackHD/on-hRp/tree/master/data/templates
    Rendered as EJS template with context from ac3ve workflow related to the node

    View Slide

  19. remote host
    microkernel
    Microkernel Tasks
    workflow
    task
    runner
    HTTP
    Job.Linux.Commands
    GET /api/1.1/tasks/bootstrap.js
    (1)
    (2) start
    task
    runner
    (3) GET /api/1.1/tasks/{id}
    (4)
    (5)
    POST/api/1.1/tasks/{id}
    hRps://github.com/RackHD/on-hRp/blob/master/lib/api/1.1/southbound/tasks.js
    Job.Linux.Bootstrap
    Job.WinPE.Bootstrap
    (0)

    View Slide