Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RackHD Debugging and Isolation

RackHD Debugging and Isolation

Overview of RackHD for the purposes of assisting with debugging and isolation of unexpected results...

Joseph Heck

April 26, 2016

More Decks by Joseph Heck

Other Decks in Technology


  1. Taking Data to Informa3on •  Analysis is cri3cal to resolving

    defects •  So=ware QA can go much deeper – All the code is available to inves3gate – Architecture is open – Responsibili3es for components well defined •  A roadmap to learn never hurts… And that is what this deck includes...
  2. Isola3on •  Break down the problem into smaller parts • 

    Look for the natural boundaries in the system •  Implies you need to know how it is built, how it interacts – Reasoning about components •  But what if you don’t know? – Or can’t remember?
  3. Scien3fic Method •  How to learn what it really does

    –  The docs lie, programmers make mistakes, and there’s always unintended consequences •  Scien3fic Method Process –  Ask a ques3on –  Do the background research –  Construct a hypothesis –  Test you hypothesis –  Analyze your data, draw a conclusion –  Communicate your results
  4. Exis3ng tests •  Integra3on Tests –  hRps://github.com/RackHD/RackHD/tree/master/test •  Unit tests

    for smaller so=ware components –  hRps://coveralls.io/github/RackHD/on-core –  hRps://coveralls.io/github/RackHD/on-dhcp-proxy –  hRps://coveralls.io/github/RackHD/on-hRp –  hRps://coveralls.io/github/RackHD/on-syslog –  hRps://coveralls.io/github/RackHD/on-taskgraph –  hRps://coveralls.io/github/RackHD/on-tasks –  hRps://coveralls.io/github/RackHD/on-Wtp •  hRp://rackhd.readthedocs.org/en/latest/repositories.html#repositories-status
  5. The Background Research •  RackHD logical architecture •  Core Concepts

    – PXE Boo3ng – how it works – Workflow Engine Interac3ons – PXE boo3ng a Microkernel to extend reach – Profiles and Templates •  Configura3on and Logs
  6. Logical Architecture on-syslog on-Wtp ISC dhcp on-dhcp-proxy on-hRp rabbitmq mongodb

    on-taskgraph SNMP IPMI AMT REDFISH … clients Incoming Events Workflow Engine Outgoing Ac3ons
  7. Configura3on and Logs on-* process hRps://github.com/RackHD/on-core/blob/master/lib/services/configura3on.js hRps://github.com/RackHD/on-core/blob/fea46c/lib/common/messenger.js#L85 Configura3on by key

    – Order of Precedence: •  Command-line argument •  Environment Variable •  Configura3on File •  /opt/monorail/config.json •  /opt/onrack/etc/monorail.json •  In-code defaults •  uri = configura3on.get(‘amqp’, ‘amqp://localhost’) •  Distributed in code where used/needed stdout upstart docker systemD /var/log/upstart/on-* docker logs {docker_id} journctl
  8. PXE (addi3onal reading) RackHD overview descrip3on •  hRp://rackhd.readthedocs.org/en/latest/how_it_works.html PXE: what

    it is, how it works •  hRps://en.m.wikipedia.org/wiki/Preboot_Execu3on_Environment The PXE Spec: •  hRp://download.intel.com/design/archives/wfm/downloads/pxespec.pdf DHCP •  hRps://en.wikipedia.org/wiki/Dynamic_Host_Configura3on_Protocol DHCP Proxy •  hRp://www.juniper.net/documenta3on/en_US/junos13.3/topics/concept/dhcp- extended-dhcp-relay-proxy-overview.html
  9. iPXE follow on iPXE request for a script “profiles” API

    Response: •  Don’t know the node: Discover it •  Known node, no Workflow: No-op or default response •  Known node, workflow: response from workflow client system RackHD hRp://rackhd.readthedocs.org/en/latest/devguide/index.html#rackhd-debugging-guide
  10. What is a workflow Graph Task Task Job Job Task

    Task Job Job Task Job Graph •  JSON document •  Describes flow of execu3on •  Wrapper for Shared op3ons and context values Task •  JSON Data only •  1:1 ra3o of tasks to jobs •  Can have 0-n tasks as run dependencies in a graph •  Target nodes or arbitrary code execu3on Job •  NodeJS code backing the Task declara3on •  Simply a class with a run func3on •  Configura3on comes from Task JSON
  11. Workflow Tasks •  Run commands, tooling –  IPMI –  SNMP

    –  RACADM •  Interact with RackHD data –  Read catalog data –  Set catalog values, node values •  Provide responses for PXE –  DHCP –  TFTP –  HTTP
  12. Profiles and Templates workflow HTTP GET /api/1.1/profiles GET /api/1.1/templates/{id} hRps://github.com/RackHD/on-hRp/blob/master/lib/api/1.1/southbound/profiles.js

    hRps://github.com/RackHD/on-hRp/blob/master/lib/api/1.1/southbound/templates.js iPXE bootloader Any “southward” ini3ated HTTP request Profile == iPXE Script Template == Any generalized template hRps://github.com/RackHD/on-hRp/tree/master/data/profiles hRps://github.com/RackHD/on-hRp/tree/master/data/templates Rendered as EJS template with context from ac3ve workflow related to the node
  13. remote host microkernel Microkernel Tasks workflow task runner HTTP Job.Linux.Commands

    GET /api/1.1/tasks/bootstrap.js (1) (2) start task runner (3) GET /api/1.1/tasks/{id} (4) (5) POST/api/1.1/tasks/{id} hRps://github.com/RackHD/on-hRp/blob/master/lib/api/1.1/southbound/tasks.js Job.Linux.Bootstrap Job.WinPE.Bootstrap (0)