Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dagster & Geomagical

Dagster & Geomagical

Noah Kantrowitz

February 09, 2021
Tweet

More Decks by Noah Kantrowitz

Other Decks in Programming

Transcript

  1. Geomagical
    & Dagster
    Dagster Community Meeting

    View full-size slide

  2. Noah Kantrowitz
    > @kantrn - coderanger.net
    > Principal Ops @ Geomagical
    > Part of the IKEA family
    > Augmented reality with furniture

    View full-size slide

  3. Starting Point
    > Celery & RabbitMQ
    > Each operation as its own daemon
    > celery.canvas
    > Custom DAG compiler

    View full-size slide

  4. Design Goals
    > Keeping most of the solid structure
    > Improved DAG expressiveness
    > Low fixed overhead, compatible with autoscaling
    > More detailed tracking and metrics

    View full-size slide

  5. Dagster
    > Met all our requirements for structural simplicity
    > DAG compiler was a bit limited but growing fast
    > Highly responsive team
    Dagster
    > No execution setup that met our needs

    View full-size slide

  6. But dagster_celery?
    > Solid and pipeline code commingled
    > Single runtime environment
    > Hard to build a workflow around at scale

    View full-size slide

  7. But dagster_k8s?
    > Fine for infrequent or non-customer facing tasks
    > Do not put kube-apiserver in your hot path
    > No really, I mean it

    View full-size slide

  8. Autoscaling
    > KEDA watching RabbitMQ
    > Zero-scale: only Dagit and gRPC daemons
    > task_acks_late = True
    > worker_prefetch_multiplier = 1

    View full-size slide

  9. Remote Solids
    > Independent release cycles for each Solid
    > Can run multiple versions in parallel
    > Testing in isolation

    View full-size slide

  10. Writing A Remote Solid
    app = SolidCelery('repo-something')
    @app.task(bind=True)
    def something(self, foo: str) -> str:
    return f'Hello {foo}'

    View full-size slide

  11. Proxy Solids
    @celery_solid(queue='repo-something')
    def something(context, item):
    output = yield {
    'foo': item['bar'],
    }
    item['something'] = output
    yield Output(item)

    View full-size slide

  12. Workflow
    > One git repo per Dagster repo
    > main.py which holds "default" Pipeline
    > solids.py which defines proxy Solids
    > Misc other pipelines for testing and development

    View full-size slide

  13. CI/CD
    Briefly, since this is its own rabbit hole
    > Buildkite
    > kustomize edit set image
    > ArgoCD

    View full-size slide

  14. Downsides
    > Slow cold start
    > No feedback during long tasks
    > New and exciting bugs

    View full-size slide

  15. How It's Going
    > Happy with overall progress
    > Still dropping some tasks at load
    > Plan to move forward looks good

    View full-size slide

  16. Future Plans
    > Async execution support
    > Events from solid workers
    > Pipeline-level webhooks
    > Predictive auto-scaling? K8s Operator?

    View full-size slide

  17. Can I Use This?
    Kinda sorta
    geomagical/dagster_geomagical

    View full-size slide

  18. Thank You
    Questions?

    View full-size slide