Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RackHD Workflow Engine implementation design

RackHD Workflow Engine implementation design

Workflow engine refactor to utilize ReactiveX models for processing workflow tasks in RackHD. Included with Pull Request in Jan 2016

Joseph Heck

January 16, 2016
Tweet

More Decks by Joseph Heck

Other Decks in Programming

Transcript

  1. 1
    © Copyright 2015 EMC Corporation. All rights reserved.
    RackHD Workflow Engine redesign
    January 27, 2016

    View Slide

  2. 2
    © Copyright 2015 EMC Corporation. All rights reserved.
    Goals of the new workflow engine
    Performance goals
    •  Entirely database backed (no workflow state kept in memory)
    •  Horizontally scalable, which enables high availability under load
    •  Fault tolerant, which leads to enabling high availability failover scenarios
    Development goals
    •  Stream based architecture of core components (Reactive paradigm)
    –  Core components of the engine listen to generic event stream APIs (push rather than pull)
    –  Enables flexibility in infrastructure decisions: easy to change
    underlying database and messaging infrastructure based on deployment constraints
    –  Enables fast response and execution times of workflow tasks
    •  Modular and extensible
    •  Backwards compatibility with current code

    View Slide

  3. 3
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    Architectural decisions to achieve goals
    •  HA/Fault tolerance
    –  Atomic checkout: All eligible Schedulers or Task Runners will receive requests, but only one will succeed in checking
    out a lease to handle that request. Somewhat like a leased queue model. Leverage existing database technologies
    (currently MongoDB)
    –  Lease heartbeat: Workflow engine instances heartbeat their owned tasks, so that other instances can check them out
    on timed out heartbeats.
    –  Backup mechanisms for dropped events: Utilize optimized pollers to queue up dropped events for re-evaluation
    (dropped events happen under high load throttling and catastrophic failure conditons).
    •  Scalability
    –  Domains: Workflow instances can be configured to handle different domains of tasks, and can be machine independent
    as long as the database and messaging are shared.
    –  Stateless: Horizontal scalability is achieved by designing the processes to run in essentially a stateless mode. The last
    word is from the database.
    –  Optimize data structure: update the current data structures and mongo collections/indexes to be optimized for fast
    querying, improved indexing
    •  Development
    –  Reactive: Utilize the ReactiveX libraries for Node.js, design loosely-coupled components joined by stream APIs

    View Slide

  4. 4
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    •  Database strategy
    –  mongo.pxe.taskdependencies: A dedicated collection for task running and graph evaluation. The taskdependency
    document structure allows for the most of the workflow evaluation logic to be performed with optimized database
    queries instead of code.
    Task Dependency document!
    {!
    "taskId" : "bd1ff046-a8e5-4587-b5e3-3ac7d5d8a974",!
    "graphId" : "da5d101d-27f6-4b17-9e90-acad06dfa90c",!
    "state" : "pending",!
    "dependencies" : {!
    "afba1db1-07aa-4ffe-814a-84dbcc02bf9b" : "finished"!
    },!
    "terminalOnStates" : [ !
    "timeout", !
    "cancelled", !
    "failed"!
    ],!
    "domain" : "default",!
    "evaluated" : false,!
    "reachable" : true,!
    "taskRunnerLease" : null,!
    "taskRunnerHeartbeat" : null,!
    "createdAt" : ISODate("2016-01-27T22:12:42.571Z"),!
    "updatedAt" : ISODate("2016-01-27T22:12:42.571Z"),!
    "_id" : ObjectId("56a940da7eb4035519b2d6e7")!
    }!
    Explanation of fields
    •  terminalOnStates
    •  Hints during graph evaluation about
    whether the graph could be potentially finished
    •  domain
    •  Separation if multiple domains are in use
    •  evaluated
    •  Used for two phase commits/transaction
    •  reachable
    •  Enables branching logic in workflow definitions
    •  dependencies
    •  References to other tasks. When they finish, this
    object gets updated. When a dependencies object
    is empty, that task is ready to run.
    •  taskRunnerLease/Heartbeat
    •  enables atomic checkout if multiple task runners are
    running. Enables recovery from failed task runners.

    View Slide

  5. 5
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    Example workflow task dependencies /* 1 */
    {
    "taskId" : "9ba3a261-4eb7-40c3-919d-c4e07607bc5c",
    "graphId" : "8fbabec5-a62e-4e90-901c-5ff353388d8f",
    "state" : "pending",
    "dependencies" : {},
    "terminalOnStates" : [
    "cancelled",
    "failed”,
    “timeout”,
    ],
    "domain" : "default",
    "evaluated" : false,
    "reachable" : true,
    "taskRunnerLease" : null,
    "taskRunnerHeartbeat" : null,
    "createdAt" : ISODate("2016-01-27T23:13:30.426Z"),
    "updatedAt" : ISODate("2016-01-27T23:13:30.426Z"),
    "_id" : ObjectId("56a94f1a929736c5b6e0cf6a")
    }
    /* 2 */
    {
    "taskId" : "fff3e311-19e3-40fb-bd49-24f5667cb51d",
    "graphId" : "8fbabec5-a62e-4e90-901c-5ff353388d8f",
    "state" : "pending",
    "dependencies" : {
    "9ba3a261-4eb7-40c3-919d-c4e07607bc5c" : ”succeeded"
    },
    "terminalOnStates" : [
    "cancelled",
    "failed”,
    “timeout”,
    “succeeded”
    ],
    "domain" : "default",
    "evaluated" : false,
    "reachable" : true,
    "taskRunnerLease" : null,
    "taskRunnerHeartbeat" : null,
    "createdAt" : ISODate("2016-01-27T23:13:30.430Z"),
    "updatedAt" : ISODate("2016-01-27T23:13:30.431Z"),
    "_id" : ObjectId("56a94f1a929736c5b6e0cf6b")
    }
    {
    "friendlyName": "noop-graph",
    "injectableName": "Graph.noop-test",
    "options": {},
    "tasks": [
    {
    "label": "noop-1",
    "taskName": "Task.noop"
    },
    {
    "label": "noop-2",
    "taskName": "Task.noop",
    "waitOn": {
    "noop-1": ”succeeded"
    }
    }
    ]
    }
    Graph definition Task dependency
    documents

    View Slide

  6. 6
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    •  Lifecycle of a taskdependency document (some fields hidden)
    {!
    "state" : "pending",!
    "dependencies" : { },!
    "domain" : "default",!
    "evaluated" : false,!
    "reachable" : true, "
    "taskRunnerLease" : 3e80b1ef-d10f-41ca-95e5-13fa5920aaf5”,"
    "taskRunnerHeartbeat" : ISODate("2016-01-27T23:00:25.991Z")"
    !
    }!
    {!
    "state" : "pending",!
    "dependencies" : { },"
    "domain" : "default",!
    "evaluated" : false,!
    "reachable" : true,!
    "taskRunnerLease" : null,!
    "taskRunnerHeartbeat" : null!
    }!
    {!
    "state" : “succeeded","
    "dependencies" : { },!
    "domain" : "default",!
    "evaluated" : true,"
    "reachable" : true,!
    "taskRunnerLease" : 3e80b1ef-d10f-41ca-95e5-13fa5920aaf5”,!
    "taskRunnerHeartbeat" : ISODate("2016-01-27T23:00:37.876Z")"
    }!
    {!
    "state" : "pending",!
    "dependencies" : {!
    "afba1db1-07aa-4ffe-814a-84dbcc02bf9b" : ”succeeded"!
    },!
    "domain" : "default",!
    "evaluated" : false,!
    "reachable" : true,!
    "taskRunnerLease" : null,!
    "taskRunnerHeartbeat" : null!
    }!
    1.  A workflow is run. The task document is created.
    2.  The task with id
    “afba1db1-07aa-4ffe-814a-84dbcc02bf9b”
    finishes with state “succeeded” and the
    dependencies object is updated accordingly.
    3.  The task is checked out, heartbeated, and run
    by a task runner.
    4.  The task completes with state “succeeded” and
    is then evaluated by the task scheduler (updating
    the dependencies objects for other task documents,
    etc.). The combination of a finished and evaluated
    states means it will be picked up for background deletion.

    View Slide

  7. 7
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    lib/services/workflow-api-service.js
    Code/project structure on-taskgraph
    on-http
    on-core
    on-tasks
    lib/workflow/stores/mongo.js
    lib/workflow/messengers/messenger-AMQP.js
    lib/workflow/task-graph.js
    (moved/refactored from on-taskgraph to
    expose to on-http)
    lib/task.js
    index.js
    (exposes base task library, deprecating soon)
    lib/task-scheduler.js
    lib/lease-expiration-poller.js
    lib/completed-task-poller.js
    lib/task-runner.js
    •  on-http/lib/services/workflow-api-service.js
    •  Handles creating and persisting
    task graph objects from workflow
    API requests
    •  on-taskgraph/lib/task-scheduler.js
    •  Evaluates graph state and
    schedules new tasks
    •  on-taskgraph/lib/lease-expiration-poller.js
    •  Expires task leases from failed task runners
    to be picked up by the scheduler
    •  on-taskgraph/lib/completed-task-poller.js
    •  Deletes finished task dependency documents
    from the database. Also queues evaluation
    for graphs in scheduler failure cases.
    •  on-taskgraph/lib/task-runner.js
    •  Receives task run events, loads the task and
    runs its job code.
    •  on-core/lib/workflow/stores/*.js
    •  Database interfaces for graph logic
    •  on-core/lib/workflow/messengers/*.js
    •  Messenging interfaces for graph events
    •  on-core/lib/workflow/task-graph.js
    •  TaskGraph creation, validation, and persistence
    code

    View Slide

  8. 8
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    task-runner
    lease-expiration-poller
    completed-task-poller
    task-scheduler
    on-taskgraph repository higher level architecture
    SCHEDULER MODE TASK RUNNER MODE
    MONGO AMQP

    View Slide

  9. 9
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    POST
    /api/current/workflows/active
    /api/current/nodes//workflows
    on-http repository higher level architecture
    Run new workflow:
    MONGO AMQP
    1.  Generate a uuid (TaskGraph identifier)
    2.  Create new TaskGraph object with it
    3.  Validate the object (happens during creation)
    4.  Persist the graph object to mongo.pxe.graphobjects
    5.  Persist task objects to mongo.pxe.taskdependencies
    6.  Publish an event to the Task Scheduler
    to evaluate the graph
    7.  Return the TaskGraph uuid to the client
    8.  If the Task Scheduler is down or crashes,
    it will pick it up out of the database.
    Route logic:

    View Slide

  10. 10
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    evaluateTaskStream
    task-scheduler
    stream architecture
    evaluateGraphStream
    Update task dependencies
    (createUpdateTaskDependenciesSubscription()) Find all pending tasks within graph
    that have an empty dependencies object
    (findReadyTasks())
    Check Graph finished
    (createCheckGraphFinishedSubscription())
    updateTaskDependencies()
    THEN
    handleEvaluatedTask():
    checkGraphFinishedStream
    If (state is
    terminal)
    If (state is NOT
    terminal)
    If (task has terminal failed state)
    failGraph()
    Else
    Check if graph is succeeded, and
    complete it if so
    (checkGraphSucceeded())
    Schedule ready tasks
    (handleScheduleTaskEvent())
    unevaluatedTaskPoller evaluatedTaskPoller
    AMQP events

    View Slide

  11. 11
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    evaluateTaskStream
    task-scheduler
    stream architecture
    evaluateGraphStream
    Update task dependencies
    (createUpdateTaskDependenciesSubscription()) Find all pending tasks within graph
    that have an empty dependencies object
    (findReadyTasks())
    Check Graph finished
    (createCheckGraphFinishedSubscription())
    updateTaskDependencies()
    THEN
    handleEvaluatedTask():
    checkGraphFinishedStream
    If (state is
    terminal)
    If (state is NOT
    terminal)
    If (task has terminal failed state)
    failGraph()
    Else
    Check if graph is succeeded, and
    complete it if so
    (checkGraphSucceeded())
    Schedule ready tasks
    (handleScheduleTaskEvent())
    AMQP
    MONGO
    unevaluatedTaskPoller evaluatedTaskPoller
    AMQP events

    View Slide

  12. 12
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    runTaskStream
    task-runner
    stream architecture
    cancelTaskStream
    checkoutTask() (atomic db)
    Task.cancel() Update every task document the
    runner owns
    heartbeat (interval)
    If (success)
    Get task definition (getTaskById())
    Run task (Task.run())
    Publish when task is finished
    AMQP events
    AMQP MONGO

    View Slide

  13. 13
    © Copyright 2015 EMC Corporation. All rights reserved.
    Implementation details
    Very brief intro to Rx.js style
    var array = [1,2,3,4,5];!
    var values = array.map(function(item) {!
    var value = item * 10;!
    return value;!
    });!
    console.log(values);!
    !
    // [10, 20, 30, 40, 50]!
    var Rx = require('rx');!
    !
    var emitter = new Rx.Subject();!
    !
    emitter!
    .map(function(item) {!
    var value = item * 10;!
    return value;!
    })!
    .subscribe(function(value) {!
    console.log(value);!
    });!
    !
    // like emit(value) in EventEmitter !
    // vocabulary!
    emitter.onNext(1);!
    emitter.onNext(2);!
    emitter.onNext(3);!
    emitter.onNext(4);!
    emitter.onNext(5);!
    !
    // 10!
    // 20!
    // 30!
    // 40!
    // 50!
    Rx: functional style !
    with streams!
    Simple functional style!
    var Rx = require('rx');!
    !
    var taskEmitter = amqpRunTaskSubscription;!
    !
    taskEmitter!
    .map(function(task) {!
    return mongo.getTaskDocument(task.id);!
    })!
    .map(function(task) {!
    return self.scheduleTask(task);!
    })!
    .subscribe(function(task) {!
    console.log('Scheduled task ' + task.id);!
    });!
    Some pseudo-code for real world!
    circumstances!

    View Slide

  14. 14
    © Copyright 2015 EMC Corporation. All rights reserved.
    Discussion

    View Slide