RackHD Workflow Engine implementation design

1 © Copyright 2015 EMC Corporation. All rights reserved. RackHD
Workflow Engine redesign January 27, 2016

2 © Copyright 2015 EMC Corporation. All rights reserved. Goals
of the new workflow engine Performance goals •  Entirely database backed (no workflow state kept in memory) •  Horizontally scalable, which enables high availability under load •  Fault tolerant, which leads to enabling high availability failover scenarios Development goals •  Stream based architecture of core components (Reactive paradigm) –  Core components of the engine listen to generic event stream APIs (push rather than pull) –  Enables flexibility in infrastructure decisions: easy to change underlying database and messaging infrastructure based on deployment constraints –  Enables fast response and execution times of workflow tasks •  Modular and extensible •  Backwards compatibility with current code

3 © Copyright 2015 EMC Corporation. All rights reserved. Implementation
details Architectural decisions to achieve goals •  HA/Fault tolerance –  Atomic checkout: All eligible Schedulers or Task Runners will receive requests, but only one will succeed in checking out a lease to handle that request. Somewhat like a leased queue model. Leverage existing database technologies (currently MongoDB) –  Lease heartbeat: Workflow engine instances heartbeat their owned tasks, so that other instances can check them out on timed out heartbeats. –  Backup mechanisms for dropped events: Utilize optimized pollers to queue up dropped events for re-evaluation (dropped events happen under high load throttling and catastrophic failure conditons). •  Scalability –  Domains: Workflow instances can be configured to handle different domains of tasks, and can be machine independent as long as the database and messaging are shared. –  Stateless: Horizontal scalability is achieved by designing the processes to run in essentially a stateless mode. The last word is from the database. –  Optimize data structure: update the current data structures and mongo collections/indexes to be optimized for fast querying, improved indexing •  Development –  Reactive: Utilize the ReactiveX libraries for Node.js, design loosely-coupled components joined by stream APIs

details •  Database strategy –  mongo.pxe.taskdependencies: A dedicated collection for task running and graph evaluation. The taskdependency document structure allows for the most of the workflow evaluation logic to be performed with optimized database queries instead of code. Task Dependency document! {! "taskId" : "bd1ff046-a8e5-4587-b5e3-3ac7d5d8a974",! "graphId" : "da5d101d-27f6-4b17-9e90-acad06dfa90c",! "state" : "pending",! "dependencies" : {! "afba1db1-07aa-4ffe-814a-84dbcc02bf9b" : "finished"! },! "terminalOnStates" : [ ! "timeout", ! "cancelled", ! "failed"! ],! "domain" : "default",! "evaluated" : false,! "reachable" : true,! "taskRunnerLease" : null,! "taskRunnerHeartbeat" : null,! "createdAt" : ISODate("2016-01-27T22:12:42.571Z"),! "updatedAt" : ISODate("2016-01-27T22:12:42.571Z"),! "_id" : ObjectId("56a940da7eb4035519b2d6e7")! }! Explanation of fields •  terminalOnStates •  Hints during graph evaluation about whether the graph could be potentially finished •  domain •  Separation if multiple domains are in use •  evaluated •  Used for two phase commits/transaction •  reachable •  Enables branching logic in workflow definitions •  dependencies •  References to other tasks. When they finish, this object gets updated. When a dependencies object is empty, that task is ready to run. •  taskRunnerLease/Heartbeat •  enables atomic checkout if multiple task runners are running. Enables recovery from failed task runners.

details Example workflow task dependencies /* 1 */ { "taskId" : "9ba3a261-4eb7-40c3-919d-c4e07607bc5c", "graphId" : "8fbabec5-a62e-4e90-901c-5ff353388d8f", "state" : "pending", "dependencies" : {}, "terminalOnStates" : [ "cancelled", "failed”, “timeout”, ], "domain" : "default", "evaluated" : false, "reachable" : true, "taskRunnerLease" : null, "taskRunnerHeartbeat" : null, "createdAt" : ISODate("2016-01-27T23:13:30.426Z"), "updatedAt" : ISODate("2016-01-27T23:13:30.426Z"), "_id" : ObjectId("56a94f1a929736c5b6e0cf6a") } /* 2 */ { "taskId" : "fff3e311-19e3-40fb-bd49-24f5667cb51d", "graphId" : "8fbabec5-a62e-4e90-901c-5ff353388d8f", "state" : "pending", "dependencies" : { "9ba3a261-4eb7-40c3-919d-c4e07607bc5c" : ”succeeded" }, "terminalOnStates" : [ "cancelled", "failed”, “timeout”, “succeeded” ], "domain" : "default", "evaluated" : false, "reachable" : true, "taskRunnerLease" : null, "taskRunnerHeartbeat" : null, "createdAt" : ISODate("2016-01-27T23:13:30.430Z"), "updatedAt" : ISODate("2016-01-27T23:13:30.431Z"), "_id" : ObjectId("56a94f1a929736c5b6e0cf6b") } { "friendlyName": "noop-graph", "injectableName": "Graph.noop-test", "options": {}, "tasks": [ { "label": "noop-1", "taskName": "Task.noop" }, { "label": "noop-2", "taskName": "Task.noop", "waitOn": { "noop-1": ”succeeded" } } ] } Graph definition Task dependency documents

details •  Lifecycle of a taskdependency document (some fields hidden) {! "state" : "pending",! "dependencies" : { },! "domain" : "default",! "evaluated" : false,! "reachable" : true, " "taskRunnerLease" : 3e80b1ef-d10f-41ca-95e5-13fa5920aaf5”," "taskRunnerHeartbeat" : ISODate("2016-01-27T23:00:25.991Z")" ! }! {! "state" : "pending",! "dependencies" : { }," "domain" : "default",! "evaluated" : false,! "reachable" : true,! "taskRunnerLease" : null,! "taskRunnerHeartbeat" : null! }! {! "state" : “succeeded"," "dependencies" : { },! "domain" : "default",! "evaluated" : true," "reachable" : true,! "taskRunnerLease" : 3e80b1ef-d10f-41ca-95e5-13fa5920aaf5”,! "taskRunnerHeartbeat" : ISODate("2016-01-27T23:00:37.876Z")" }! {! "state" : "pending",! "dependencies" : {! "afba1db1-07aa-4ffe-814a-84dbcc02bf9b" : ”succeeded"! },! "domain" : "default",! "evaluated" : false,! "reachable" : true,! "taskRunnerLease" : null,! "taskRunnerHeartbeat" : null! }! 1.  A workflow is run. The task document is created. 2.  The task with id “afba1db1-07aa-4ffe-814a-84dbcc02bf9b” finishes with state “succeeded” and the dependencies object is updated accordingly. 3.  The task is checked out, heartbeated, and run by a task runner. 4.  The task completes with state “succeeded” and is then evaluated by the task scheduler (updating the dependencies objects for other task documents, etc.). The combination of a finished and evaluated states means it will be picked up for background deletion.

details lib/services/workflow-api-service.js Code/project structure on-taskgraph on-http on-core on-tasks lib/workflow/stores/mongo.js lib/workflow/messengers/messenger-AMQP.js lib/workflow/task-graph.js (moved/refactored from on-taskgraph to expose to on-http) lib/task.js index.js (exposes base task library, deprecating soon) lib/task-scheduler.js lib/lease-expiration-poller.js lib/completed-task-poller.js lib/task-runner.js •  on-http/lib/services/workflow-api-service.js •  Handles creating and persisting task graph objects from workflow API requests •  on-taskgraph/lib/task-scheduler.js •  Evaluates graph state and schedules new tasks •  on-taskgraph/lib/lease-expiration-poller.js •  Expires task leases from failed task runners to be picked up by the scheduler •  on-taskgraph/lib/completed-task-poller.js •  Deletes finished task dependency documents from the database. Also queues evaluation for graphs in scheduler failure cases. •  on-taskgraph/lib/task-runner.js •  Receives task run events, loads the task and runs its job code. •  on-core/lib/workflow/stores/*.js •  Database interfaces for graph logic •  on-core/lib/workflow/messengers/*.js •  Messenging interfaces for graph events •  on-core/lib/workflow/task-graph.js •  TaskGraph creation, validation, and persistence code

details task-runner lease-expiration-poller completed-task-poller task-scheduler on-taskgraph repository higher level architecture SCHEDULER MODE TASK RUNNER MODE MONGO AMQP

details POST /api/current/workflows/active /api/current/nodes/<id>/workflows on-http repository higher level architecture Run new workflow: MONGO AMQP 1.  Generate a uuid (TaskGraph identifier) 2.  Create new TaskGraph object with it 3.  Validate the object (happens during creation) 4.  Persist the graph object to mongo.pxe.graphobjects 5.  Persist task objects to mongo.pxe.taskdependencies 6.  Publish an event to the Task Scheduler to evaluate the graph 7.  Return the TaskGraph uuid to the client 8.  If the Task Scheduler is down or crashes, it will pick it up out of the database. Route logic:

details evaluateTaskStream task-scheduler stream architecture evaluateGraphStream Update task dependencies (createUpdateTaskDependenciesSubscription()) Find all pending tasks within graph that have an empty dependencies object (findReadyTasks()) Check Graph finished (createCheckGraphFinishedSubscription()) updateTaskDependencies() THEN handleEvaluatedTask(): checkGraphFinishedStream If (state is terminal) If (state is NOT terminal) If (task has terminal failed state) failGraph() Else Check if graph is succeeded, and complete it if so (checkGraphSucceeded()) Schedule ready tasks (handleScheduleTaskEvent()) unevaluatedTaskPoller evaluatedTaskPoller AMQP events

details evaluateTaskStream task-scheduler stream architecture evaluateGraphStream Update task dependencies (createUpdateTaskDependenciesSubscription()) Find all pending tasks within graph that have an empty dependencies object (findReadyTasks()) Check Graph finished (createCheckGraphFinishedSubscription()) updateTaskDependencies() THEN handleEvaluatedTask(): checkGraphFinishedStream If (state is terminal) If (state is NOT terminal) If (task has terminal failed state) failGraph() Else Check if graph is succeeded, and complete it if so (checkGraphSucceeded()) Schedule ready tasks (handleScheduleTaskEvent()) AMQP MONGO unevaluatedTaskPoller evaluatedTaskPoller AMQP events

details runTaskStream task-runner stream architecture cancelTaskStream checkoutTask() (atomic db) Task.cancel() Update every task document the runner owns heartbeat (interval) If (success) Get task definition (getTaskById()) Run task (Task.run()) Publish when task is finished AMQP events AMQP MONGO

details Very brief intro to Rx.js style var array = [1,2,3,4,5];! var values = array.map(function(item) {! var value = item * 10;! return value;! });! console.log(values);! ! // [10, 20, 30, 40, 50]! var Rx = require('rx');! ! var emitter = new Rx.Subject();! ! emitter! .map(function(item) {! var value = item * 10;! return value;! })! .subscribe(function(value) {! console.log(value);! });! ! // like emit(value) in EventEmitter ! // vocabulary! emitter.onNext(1);! emitter.onNext(2);! emitter.onNext(3);! emitter.onNext(4);! emitter.onNext(5);! ! // 10! // 20! // 30! // 40! // 50! Rx: functional style ! with streams! Simple functional style! var Rx = require('rx');! ! var taskEmitter = amqpRunTaskSubscription;! ! taskEmitter! .map(function(task) {! return mongo.getTaskDocument(task.id);! })! .map(function(task) {! return self.scheduleTask(task);! })! .subscribe(function(task) {! console.log('Scheduled task ' + task.id);! });! Some pseudo-code for real world! circumstances!

RackHD Workflow Engine implementation design

RackHD Workflow Engine implementation design

Joseph Heck

More Decks by Joseph Heck

Other Decks in Programming

Featured

Transcript

1 © Copyright 2015 EMC Corporation. All rights reserved. RackHD

2 © Copyright 2015 EMC Corporation. All rights reserved. Goals

3 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

4 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

5 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

6 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

7 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

8 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

9 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

10 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

11 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

12 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

13 © Copyright 2015 EMC Corporation. All rights reserved. Implementation

14 © Copyright 2015 EMC Corporation. All rights reserved. Discussion