Slide 13
Slide 13 text
13
● support for upserting records in DFS
● support for (pluggable) indexing records for fast
updates and deletes
● support two table types1
○ Copy on Write (COW): data stored in purely
columnar format
○ Merge On Read (MOR): data stored using
combination of columnar and row format
● In MOR table, updates are added in delta (avro)
files which are later compacted with columnar
files synchronously or asynchronously
● Depending on one’s requirement, compaction can
be tuned
○ We compact every write, since Amazon
Athena only serves Read Optimized View2
● And Amazon EMR supports3,4 Hudi
Data Lake at PayPay
We use MOR
table type
Amazon
EMR
We use Apache
Hudi 0.7.0 with
EMR 6.0.0
1. https://hudi.apache.org/docs/overview.html#table-types
2. https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html
3. https://aws.amazon.com/emr/features/hudi/
4. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html (support is for Hudi 0.5.0)
Support for Fast Writes
using
Apache Hudi