An opinionated list of best practices to begin with:
#1 Ingest data as-raw-as-possible
#2 Create idempotent and deterministic processes
#3 Rest data between tasks
#4 Validate data after each step
#5 Create workflow as a code
Recorded talk: https://youtu.be/PKgDjGCYKTE?t=454