Slide 6
Slide 6 text
- Freshness: How often does the data need to be updated? (Real-
time vs. hourly or less)
- Streaming vs. Batch pipeline
- We need: Daily (Batch)
- Volume: Does the data
fi
t into the memory of one machine?
- Single machine vs. distributed machines architecture
- 30 supermarkets with ~500 transactions per day, 50 bytes per
transaction => less than 1GB per day =>
fi
ts into one machine
- Source & Destination Connectors:
- Type of data access? (API, storage, stream, …)
- Data format? (JSON, CSV, Parquet, binary, …)
2. DECIDE ON AN ARCHITECTURE PATTERN