Slide 1

Slide 1 text

Geomagical & Dagster Dagster Community Meeting

Slide 2

Slide 2 text

Noah Kantrowitz > @kantrn - coderanger.net > Principal Ops @ Geomagical > Part of the IKEA family > Augmented reality with furniture

Slide 3

Slide 3 text

Our Product

Slide 4

Slide 4 text

Starting Point > Celery & RabbitMQ > Each operation as its own daemon > celery.canvas > Custom DAG compiler

Slide 5

Slide 5 text

Design Goals > Keeping most of the solid structure > Improved DAG expressiveness > Low fixed overhead, compatible with autoscaling > More detailed tracking and metrics

Slide 6

Slide 6 text

Dagster > Met all our requirements for structural simplicity > DAG compiler was a bit limited but growing fast > Highly responsive team Dagster > No execution setup that met our needs

Slide 7

Slide 7 text

But dagster_celery? > Solid and pipeline code commingled > Single runtime environment > Hard to build a workflow around at scale

Slide 8

Slide 8 text

But dagster_k8s? > Fine for infrequent or non-customer facing tasks > Do not put kube-apiserver in your hot path > No really, I mean it

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Autoscaling > KEDA watching RabbitMQ > Zero-scale: only Dagit and gRPC daemons > task_acks_late = True > worker_prefetch_multiplier = 1

Slide 11

Slide 11 text

Remote Solids > Independent release cycles for each Solid > Can run multiple versions in parallel > Testing in isolation

Slide 12

Slide 12 text

Writing A Remote Solid app = SolidCelery('repo-something') @app.task(bind=True) def something(self, foo: str) -> str: return f'Hello {foo}'

Slide 13

Slide 13 text

Proxy Solids @celery_solid(queue='repo-something') def something(context, item): output = yield { 'foo': item['bar'], } item['something'] = output yield Output(item)

Slide 14

Slide 14 text

Workflow > One git repo per Dagster repo > main.py which holds "default" Pipeline > solids.py which defines proxy Solids > Misc other pipelines for testing and development

Slide 15

Slide 15 text

CI/CD Briefly, since this is its own rabbit hole > Buildkite > kustomize edit set image > ArgoCD

Slide 16

Slide 16 text

Downsides > Slow cold start > No feedback during long tasks > New and exciting bugs

Slide 17

Slide 17 text

How It's Going > Happy with overall progress > Still dropping some tasks at load > Plan to move forward looks good

Slide 18

Slide 18 text

Future Plans > Async execution support > Events from solid workers > Pipeline-level webhooks > Predictive auto-scaling? K8s Operator?

Slide 19

Slide 19 text

Can I Use This? Kinda sorta geomagical/dagster_geomagical

Slide 20

Slide 20 text

Thank You Questions?