the standard index type for the Spark engine. It executes an efficient join of incoming records with keys retrieved from the table stored on disk. It requires keys to be partition-level unique so it can function correctly. see. https://hudi.apache.org/docs/indexing
the file group that houses the records, which proves to be particularly advantageous on a large scale. To select the type of bucket engine—that is, the method by which buckets are created—use the hoodie.index.bucket.engine configuration option. これはレコードのキーをハッシュ関数にかけてバケット(つまりfile group)を特定し、 それをレコードの所在と判断する方式だ。O(1)で即レコードを特定できるので大幅な 書き込み対象lookupの高速化が狙える。
fixed number of buckets for file groups within each partition, which do not have the capacity to decrease or increase in size. It is applicable to both COW and MOR tables. Due to the unchangeable number of buckets and the design principle of mapping each bucket to a single file group, this indexing method may not be ideal for partitions with significant data skew. CONSISTENT_HASHING: This index accommodates a dynamic number of buckets, with the capability for bucket resizing to ensure each bucket is sized appropriately. This addresses the issue of data skew in partitions with a high volume of data by allowing these partitions to be dynamically resized. As a