Slide 1

Slide 1 text

Funnelselect: Cache-Oblivious Multiple Selection Sebastian Wild joint work with Gerth Stølting Brodal European Symposium on Algorithms 2023 CWI Amsterdam Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 0 / 12

Slide 2

Slide 2 text

Outline 1 Multiple Selection 1 Multiple Selection 2 Cache Oblivious Algorithms 2 Cache Oblivious Algorithms 3 Funnelselect 3 Funnelselect 4 Conclusion 4 Conclusion Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 0 / 12

Slide 3

Slide 3 text

1 Multiple Selection 1 Multiple Selection Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 0 / 12

Slide 4

Slide 4 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 5

Slide 5 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sorting Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 6

Slide 6 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sorting (Single) Selection e.g. find median by Rank Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 7

Slide 7 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sorting (Single) Selection e.g. find median by Rank O(N log2 N) comparisons, time O(N B logM/B N B ) I/Os internal: Mergesort IO: Multiway Mergesort CO: Funnelsort O(N) comparisons, time O(N/B) I/Os internal: Median-of-medians algorithm IO: Median-of-medians algorithm CO: Median-of-medians algorithm Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 8

Slide 8 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sorting (Single) Selection e.g. find median by Rank O(N log2 N) comparisons, time O(N B logM/B N B ) I/Os internal: Mergesort IO: Multiway Mergesort CO: Funnelsort O(N) comparisons, time O(N/B) I/Os internal: Median-of-medians algorithm IO: Median-of-medians algorithm CO: Median-of-medians algorithm more sorted more expensive Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 9

Slide 9 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sorting (Single) Selection e.g. find median by Rank O(N log2 N) comparisons, time O(N B logM/B N B ) I/Os internal: Mergesort IO: Multiway Mergesort CO: Funnelsort O(N) comparisons, time O(N/B) I/Os internal: Median-of-medians algorithm IO: Median-of-medians algorithm CO: Median-of-medians algorithm more sorted more expensive M u l t i p l e S e l e c t i o n Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 10

Slide 10 text

“What’s this about” in 2min Two ancient comparison-based problems on unordered list of N elements Sorting (Single) Selection e.g. find median by Rank O(N log2 N) comparisons, time O(N B logM/B N B ) I/Os internal: Mergesort IO: Multiway Mergesort CO: Funnelsort O(N) comparisons, time O(N/B) I/Os internal: Median-of-medians algorithm IO: Median-of-medians algorithm CO: Median-of-medians algorithm more sorted more expensive M u l t i p l e S e l e c t i o n This talk: What happens between selection and sorting in the cache-oblivious model? Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 1 / 12

Slide 11

Slide 11 text

The Multiple Selection Problem Goal: find q elements of ranks r1 < r2 < · · · < rq from unsorted elements here: all distinct x1 , . . . , xN ⇝ Report sought elements x(r1) , . . . , x(rq) in sorted order Example: 67 x1 30 x2 45 x3 33 x4 15 x5 99 x6 26 x7 90 x8 55 x9 9 x10 96 x11 45 x12 95 x13 31 x14 3 x15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 r1 = 1 r2 = 2 r3 = 3 r4 = 8 Simple algorithms: 1 q calls to selection algorithm (quickselect, median of medians) ⇝ O(Nq) 2 Divide & conquer 1: (single) select r⌈q/2⌉ -th smallest and partition x1 , . . . , xN around it. Recursively select r1 , . . . , r⌈q/2⌉−1 and r⌈q/2⌉+1 , . . . , rq ⇝ O(N lg q) 3 Divide & conquer 2: Find median of x1 , . . . , xN and split query ranks. Recurse where subproblem contains query ranks. ⇝ O(N lg q) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 2 / 12

Slide 12

Slide 12 text

The Multiple Selection Problem Goal: find q elements of ranks r1 < r2 < · · · < rq from unsorted elements here: all distinct x1 , . . . , xN ⇝ Report sought elements x(r1) , . . . , x(rq) in sorted order Example: 67 x1 30 x2 45 x3 33 x4 15 x5 99 x6 26 x7 90 x8 55 x9 9 x10 96 x11 45 x12 95 x13 31 x14 3 x15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 r1 = 1 r2 = 2 r3 = 3 r4 = 8 Simple algorithms: 1 q calls to selection algorithm (quickselect, median of medians) ⇝ O(Nq) 2 Divide & conquer 1: (single) select r⌈q/2⌉ -th smallest and partition x1 , . . . , xN around it. Recursively select r1 , . . . , r⌈q/2⌉−1 and r⌈q/2⌉+1 , . . . , rq ⇝ O(N lg q) 3 Divide & conquer 2: Find median of x1 , . . . , xN and split query ranks. Recurse where subproblem contains query ranks. ⇝ O(N lg q) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 2 / 12

Slide 13

Slide 13 text

The Multiple Selection Problem Goal: find q elements of ranks r1 < r2 < · · · < rq from unsorted elements here: all distinct x1 , . . . , xN ⇝ Report sought elements x(r1) , . . . , x(rq) in sorted order Example: 67 x1 30 x2 45 x3 33 x4 15 x5 99 x6 26 x7 90 x8 55 x9 9 x10 96 x11 45 x12 95 x13 31 x14 3 x15 3 1 9 2 15 3 26 4 30 5 31 6 33 7 45 8 45 9 55 10 67 11 90 12 95 13 96 14 99 15 r1 = 1 r2 = 2 r3 = 3 r4 = 8 Simple algorithms: 1 q calls to selection algorithm (quickselect, median of medians) ⇝ O(Nq) 2 Divide & conquer 1: (single) select r⌈q/2⌉ -th smallest and partition x1 , . . . , xN around it. Recursively select r1 , . . . , r⌈q/2⌉−1 and r⌈q/2⌉+1 , . . . , rq ⇝ O(N lg q) 3 Divide & conquer 2: Find median of x1 , . . . , xN and split query ranks. Recurse where subproblem contains query ranks. ⇝ O(N lg q) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 2 / 12

Slide 14

Slide 14 text

The Multiple Selection Problem Goal: find q elements of ranks r1 < r2 < · · · < rq from unsorted elements here: all distinct x1 , . . . , xN ⇝ Report sought elements x(r1) , . . . , x(rq) in sorted order Example: 67 x1 30 x2 45 x3 33 x4 15 x5 99 x6 26 x7 90 x8 55 x9 9 x10 96 x11 45 x12 95 x13 31 x14 3 x15 3 1 9 2 15 3 26 4 30 5 31 6 33 7 45 8 45 9 55 10 67 11 90 12 95 13 96 14 99 15 r1 = 1 r2 = 2 r3 = 3 r4 = 8 Answer: 3, 9, 15, 45 Simple algorithms: 1 q calls to selection algorithm (quickselect, median of medians) ⇝ O(Nq) 2 Divide & conquer 1: (single) select r⌈q/2⌉ -th smallest and partition x1 , . . . , xN around it. Recursively select r1 , . . . , r⌈q/2⌉−1 and r⌈q/2⌉+1 , . . . , rq ⇝ O(N lg q) 3 Divide & conquer 2: Find median of x1 , . . . , xN and split query ranks. Recurse where subproblem contains query ranks. ⇝ O(N lg q) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 2 / 12

Slide 15

Slide 15 text

The Multiple Selection Problem Goal: find q elements of ranks r1 < r2 < · · · < rq from unsorted elements here: all distinct x1 , . . . , xN ⇝ Report sought elements x(r1) , . . . , x(rq) in sorted order Example: 67 x1 30 x2 45 x3 33 x4 15 x5 99 x6 26 x7 90 x8 55 x9 9 x10 96 x11 45 x12 95 x13 31 x14 3 x15 3 1 9 2 15 3 26 4 30 5 31 6 33 7 45 8 45 9 55 10 67 11 90 12 95 13 96 14 99 15 r1 = 1 r2 = 2 r3 = 3 r4 = 8 Answer: 3, 9, 15, 45 Simple algorithms: 1 q calls to selection algorithm (quickselect, median of medians) ⇝ O(Nq) 2 Divide & conquer 1: (single) select r⌈q/2⌉ -th smallest and partition x1 , . . . , xN around it. Recursively select r1 , . . . , r⌈q/2⌉−1 and r⌈q/2⌉+1 , . . . , rq ⇝ O(N lg q) 3 Divide & conquer 2: Find median of x1 , . . . , xN and split query ranks. Recurse where subproblem contains query ranks. ⇝ O(N lg q) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 2 / 12

Slide 16

Slide 16 text

Fine-grained bounds “Gap Entropy”: B = q+1 i=1 ∆i log2 N ∆i with ∆i = ri − ri−1 (1 ⩽ i ⩽ q + 1, r0 = 0 and rq+1 = N + 1) r1 = 1 r2 = 2 r3 = 3 r4 = 8 ∆ 0 = 1 ∆ 1 = 1 ∆ 2 = 1 ∆0 = 5 ∆1 = 8 ⇝ B ≈ 26.9 lower bound of B − O(N) comparisons via “finish sorting” argument upper bound of B + o(B) + O(N) comparisons Kaligosi, Mehlhorn, Munro & Sanders: Towards Optimal Multiple Selection, ICALP 2005 follows from nontrivial analysis; much easier: repeated median selection & recursion gets O(B + N) (→ paper) ⇝ In internal memory, multiple-selection problem solved✓ Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 3 / 12

Slide 17

Slide 17 text

Fine-grained bounds “Gap Entropy”: B = q+1 i=1 ∆i log2 N ∆i with ∆i = ri − ri−1 (1 ⩽ i ⩽ q + 1, r0 = 0 and rq+1 = N + 1) r1 = 1 r2 = 2 r3 = 3 r4 = 8 ∆ 0 = 1 ∆ 1 = 1 ∆ 2 = 1 ∆0 = 5 ∆1 = 8 ⇝ B ≈ 26.9 lower bound of B − O(N) comparisons via “finish sorting” argument upper bound of B + o(B) + O(N) comparisons Kaligosi, Mehlhorn, Munro & Sanders: Towards Optimal Multiple Selection, ICALP 2005 follows from nontrivial analysis; much easier: repeated median selection & recursion gets O(B + N) (→ paper) ⇝ In internal memory, multiple-selection problem solved✓ Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 3 / 12

Slide 18

Slide 18 text

Fine-grained bounds “Gap Entropy”: B = q+1 i=1 ∆i log2 N ∆i with ∆i = ri − ri−1 (1 ⩽ i ⩽ q + 1, r0 = 0 and rq+1 = N + 1) r1 = 1 r2 = 2 r3 = 3 r4 = 8 ∆ 0 = 1 ∆ 1 = 1 ∆ 2 = 1 ∆0 = 5 ∆1 = 8 ⇝ B ≈ 26.9 lower bound of B − O(N) comparisons via “finish sorting” argument upper bound of B + o(B) + O(N) comparisons Kaligosi, Mehlhorn, Munro & Sanders: Towards Optimal Multiple Selection, ICALP 2005 follows from nontrivial analysis; much easier: repeated median selection & recursion gets O(B + N) (→ paper) ⇝ In internal memory, multiple-selection problem solved✓ Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 3 / 12

Slide 19

Slide 19 text

Outline 1 Multiple Selection 1 Multiple Selection 2 Cache Oblivious Algorithms 2 Cache Oblivious Algorithms 3 Funnelselect 3 Funnelselect 4 Conclusion 4 Conclusion Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 3 / 12

Slide 20

Slide 20 text

2 Cache Oblivious Algorithms 2 Cache Oblivious Algorithms Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 3 / 12

Slide 21

Slide 21 text

IO Model & CO Model CPU fast random access slow access only in blocks of B cells IO Model (External Memory Model): Cost of computation = #“I/Os” = #blocks transfered between cache & external memory Cache size M ... external memory – size unbounded Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 4 / 12

Slide 22

Slide 22 text

IO Model & CO Model CPU fast random access slow access only in blocks of B cells IO Model (External Memory Model): Cost of computation = #“I/Os” = #blocks transfered between cache & external memory Cache size M ... external memory – size unbounded Cache Oblivious (CO) Model algorithm can’t use M and B (I/Os automatic, OPT paging policy) ⇝ CO algorithm works for all even hierarchies! M and B Frigo, Leiserson, Prokop, Ramachandran: Cache-oblivious algorithms, TALG 2012 Tall Cache Assumption: M ⩾ B1+ε think: ε = 1 ⇝ cache fits many cache lines necessary for existence of I/O-optimal (comparison-based) cache-oblivious sorting algorithms Brodal, Fagerberg: On the limits of cache-obliviousness, STOC 2003 Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 4 / 12

Slide 23

Slide 23 text

IO Model & CO Model CPU fast random access slow access only in blocks of B cells IO Model (External Memory Model): Cost of computation = #“I/Os” = #blocks transfered between cache & external memory Cache size M ... external memory – size unbounded Cache Oblivious (CO) Model algorithm can’t use M and B (I/Os automatic, OPT paging policy) ⇝ CO algorithm works for all even hierarchies! M and B Frigo, Leiserson, Prokop, Ramachandran: Cache-oblivious algorithms, TALG 2012 Tall Cache Assumption: M ⩾ B1+ε think: ε = 1 ⇝ cache fits many cache lines necessary for existence of I/O-optimal (comparison-based) cache-oblivious sorting algorithms Brodal, Fagerberg: On the limits of cache-obliviousness, STOC 2003 Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 4 / 12

Slide 24

Slide 24 text

IO Model & CO Model CPU fast random access slow access only in blocks of B cells IO Model (External Memory Model): Cost of computation = #“I/Os” = #blocks transfered between cache & external memory Cache size M ... external memory – size unbounded Cache Oblivious (CO) Model algorithm can’t use M and B (I/Os automatic, OPT paging policy) ⇝ CO algorithm works for all even hierarchies! M and B Frigo, Leiserson, Prokop, Ramachandran: Cache-oblivious algorithms, TALG 2012 Tall Cache Assumption: M ⩾ B1+ε think: ε = 1 ⇝ cache fits many cache lines necessary for existence of I/O-optimal (comparison-based) cache-oblivious sorting algorithms Brodal, Fagerberg: On the limits of cache-obliviousness, STOC 2003 Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 4 / 12

Slide 25

Slide 25 text

Previous Work & Our Results Reference Comparisons I/Os Comments Single selection Blum et al. 1973 wc 5.4305N O(N/B) CO, deterministic Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 5 / 12

Slide 26

Slide 26 text

Previous Work & Our Results Reference Comparisons I/Os Comments Single selection Blum et al. 1973 wc 5.4305N O(N/B) CO, deterministic Multiple selection Dobkin & Munro 1981 wc 3B + O(N) O((B + N)/B) CO, deterministic Hu et al. 2014 wc O(B + N) O(BI/O + N/B) deterministic Barbay et al. 2016 wc O(B + N) O(BI/O + N/B) online, determ., M ⩾ B1+ε B = q+1 i=1 ∆i log2 N ∆i BI/O = q+1 i=1 ∆i B logM B N ∆i = B B log2 (M/B) ≪ (usually) B B Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 5 / 12

Slide 27

Slide 27 text

Previous Work & Our Results Reference Comparisons I/Os Comments Single selection Blum et al. 1973 wc 5.4305N O(N/B) CO, deterministic Multiple selection Dobkin & Munro 1981 wc 3B + O(N) O((B + N)/B) CO, deterministic Hu et al. 2014 wc O(B + N) O(BI/O + N/B) deterministic Barbay et al. 2016 wc O(B + N) O(BI/O + N/B) online, determ., M ⩾ B1+ε Funnelselect E O(B + N) O(BI/O + N/B) CO, randomized, M ⩾ B1+ε Lower bound B − O(N) Ω(BI/O ) − O N B M ⩾ B1+ε B = q+1 i=1 ∆i log2 N ∆i BI/O = q+1 i=1 ∆i B logM B N ∆i = B B log2 (M/B) ≪ (usually) B B Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 5 / 12

Slide 28

Slide 28 text

Previous Work & Our Results Reference Comparisons I/Os Comments Single selection Blum et al. 1973 wc 5.4305N O(N/B) CO, deterministic Multiple selection Dobkin & Munro 1981 wc 3B + O(N) O((B + N)/B) CO, deterministic Hu et al. 2014 wc O(B + N) O(BI/O + N/B) deterministic Barbay et al. 2016 wc O(B + N) O(BI/O + N/B) online, determ., M ⩾ B1+ε Funnelselect E O(B + N) O(BI/O + N/B) CO, randomized, M ⩾ B1+ε Lower bound B − O(N) Ω(BI/O ) − O N B M ⩾ B1+ε B = q+1 i=1 ∆i log2 N ∆i BI/O = q+1 i=1 ∆i B logM B N ∆i = B B log2 (M/B) ≪ (usually) B B Funnelselect is the first cache-oblivious I/O-optimal algorithm. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 5 / 12

Slide 29

Slide 29 text

Our technical challenge 1 Key algorithmic change from internal ⇝ external-memory: (for sorting and multiple selection) binary partitioning ⇝ ≈ M B -way partitioning can’t do that cache obliviously 2 cache-oblivious sorting uses Nδ-way partitioning by “funnels” ⇝ first partitioning round already has sorting complexity ⇝ ω(BI/O ) I/Os can’t use that for multiple selection Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 6 / 12

Slide 30

Slide 30 text

Our technical challenge 1 Key algorithmic change from internal ⇝ external-memory: (for sorting and multiple selection) binary partitioning ⇝ ≈ M B -way partitioning can’t do that cache obliviously 2 cache-oblivious sorting uses Nδ-way partitioning by “funnels” ⇝ first partitioning round already has sorting complexity ⇝ ω(BI/O ) I/Os can’t use that for multiple selection Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 6 / 12

Slide 31

Slide 31 text

Our technical challenge 1 Key algorithmic change from internal ⇝ external-memory: (for sorting and multiple selection) binary partitioning ⇝ ≈ M B -way partitioning can’t do that cache obliviously 2 cache-oblivious sorting uses Nδ-way partitioning by “funnels” ⇝ first partitioning round already has sorting complexity ⇝ ω(BI/O ) I/Os can’t use that for multiple selection Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 6 / 12

Slide 32

Slide 32 text

Our technical challenge 1 Key algorithmic change from internal ⇝ external-memory: (for sorting and multiple selection) binary partitioning ⇝ ≈ M B -way partitioning can’t do that cache obliviously 2 cache-oblivious sorting uses Nδ-way partitioning by “funnels” ⇝ first partitioning round already has sorting complexity ⇝ ω(BI/O ) I/Os can’t use that for multiple selection Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 6 / 12

Slide 33

Slide 33 text

Outline 1 Multiple Selection 1 Multiple Selection 2 Cache Oblivious Algorithms 2 Cache Oblivious Algorithms 3 Funnelselect 3 Funnelselect 4 Conclusion 4 Conclusion Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 6 / 12

Slide 34

Slide 34 text

3 Funnelselect 3 Funnelselect Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 6 / 12

Slide 35

Slide 35 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 36

Slide 36 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 output array input arrays Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 37

Slide 37 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 output array input arrays P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 input array output arrays Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 38

Slide 38 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 input array output arrays √ k-partitioner √ k-partitioners k-partitioner Funnel recursion (van Emde Boas) Funnel buffer sizes (largest in middle layer) ⇝ overall space O(k5/2) = O(N5/8) Nodes partition around pivot value P Partition = push input down when output buffer full, recursively push need final flush of all buffers ≈ I/O-optimal cache-oblivious gadget for ˆ k-way partitioning with ˆ k ≈ M0.3 ⇝ Can be used for expected I/O-optimal CO quicksort (but better options available) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 39

Slide 39 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 input array output arrays √ k-partitioner √ k-partitioners k2 per buffer k-partitioner Funnel recursion (van Emde Boas) Funnel buffer sizes (largest in middle layer) ⇝ overall space O(k5/2) = O(N5/8) Nodes partition around pivot value P Partition = push input down when output buffer full, recursively push need final flush of all buffers ≈ I/O-optimal cache-oblivious gadget for ˆ k-way partitioning with ˆ k ≈ M0.3 ⇝ Can be used for expected I/O-optimal CO quicksort (but better options available) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 40

Slide 40 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 input array output arrays √ k-partitioner √ k-partitioners k2 per buffer k-partitioner Funnel recursion (van Emde Boas) Funnel buffer sizes (largest in middle layer) ⇝ overall space O(k5/2) = O(N5/8) Nodes partition around pivot value P Partition = push input down when output buffer full, recursively push need final flush of all buffers ≈ I/O-optimal cache-oblivious gadget for ˆ k-way partitioning with ˆ k ≈ M0.3 ⇝ Can be used for expected I/O-optimal CO quicksort (but better options available) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 41

Slide 41 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 input array output arrays √ k-partitioner √ k-partitioners k2 per buffer k-partitioner Funnel recursion (van Emde Boas) Funnel buffer sizes (largest in middle layer) ⇝ overall space O(k5/2) = O(N5/8) Nodes partition around pivot value P Partition = push input down when output buffer full, recursively push need final flush of all buffers ≈ I/O-optimal cache-oblivious gadget for ˆ k-way partitioning with ˆ k ≈ M0.3 ⇝ Can be used for expected I/O-optimal CO quicksort (but better options available) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 42

Slide 42 text

Funnels for Partitioning Can use funnels from Funnelsort in reverse! Funnel: k = 4 √ N-way merger (recursive binary merges) judiciously sized buffers for intermediate results Simplifying assumptions: N = 22i and d = 4 P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 input array output arrays √ k-partitioner √ k-partitioners k2 per buffer k-partitioner Funnel recursion (van Emde Boas) Funnel buffer sizes (largest in middle layer) ⇝ overall space O(k5/2) = O(N5/8) Nodes partition around pivot value P Partition = push input down when output buffer full, recursively push need final flush of all buffers ≈ I/O-optimal cache-oblivious gadget for ˆ k-way partitioning with ˆ k ≈ M0.3 ⇝ Can be used for expected I/O-optimal CO quicksort (but better options available) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 7 / 12

Slide 43

Slide 43 text

Early Truncation Recall: for multiple selection (in general), can’t use full k-partitioner (already sorting complexity) Early truncation: Don’t split buckets that don’t contain any query ranks! Buckets depend on (random) pivots ⇝ Only know a query’s bucket after partitioning ... Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. ⇝ If bucket is expected query free, don’t split further. “Expected query free” known up front (Depends only on N & query ranks) ⇝ Can remove unwanted partitioning nodes from funnel in preprocessing. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 8 / 12

Slide 44

Slide 44 text

Early Truncation Recall: for multiple selection (in general), can’t use full k-partitioner (already sorting complexity) Early truncation: Don’t split buckets that don’t contain any query ranks! P1 P3 P5 P7 P9 P11 P13 P15 P2 P6 P10 P14 P4 P12 P8 Buckets depend on (random) pivots ⇝ Only know a query’s bucket after partitioning ... Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. ⇝ If bucket is expected query free, don’t split further. “Expected query free” known up front (Depends only on N & query ranks) ⇝ Can remove unwanted partitioning nodes from funnel in preprocessing. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 8 / 12

Slide 45

Slide 45 text

Early Truncation Recall: for multiple selection (in general), can’t use full k-partitioner (already sorting complexity) Early truncation: Don’t split buckets that don’t contain any query ranks! P1 P3 P7 P9 P2 P6 P10 P4 P12 P8 Buckets depend on (random) pivots ⇝ Only know a query’s bucket after partitioning ... Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. ⇝ If bucket is expected query free, don’t split further. “Expected query free” known up front (Depends only on N & query ranks) ⇝ Can remove unwanted partitioning nodes from funnel in preprocessing. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 8 / 12

Slide 46

Slide 46 text

Early Truncation Recall: for multiple selection (in general), can’t use full k-partitioner (already sorting complexity) Early truncation: Don’t split buckets that don’t contain any query ranks! P1 P3 P7 P9 P2 P6 P10 P4 P12 P8 Buckets depend on (random) pivots ⇝ Only know a query’s bucket after partitioning ... Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. ⇝ If bucket is expected query free, don’t split further. “Expected query free” known up front (Depends only on N & query ranks) ⇝ Can remove unwanted partitioning nodes from funnel in preprocessing. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 8 / 12

Slide 47

Slide 47 text

Early Truncation Recall: for multiple selection (in general), can’t use full k-partitioner (already sorting complexity) Early truncation: Don’t split buckets that don’t contain any query ranks! P1 P3 P7 P9 P2 P6 P10 P4 P12 P8 Buckets depend on (random) pivots ⇝ Only know a query’s bucket after partitioning ... Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. ⇝ If bucket is expected query free, don’t split further. “Expected query free” known up front (Depends only on N & query ranks) ⇝ Can remove unwanted partitioning nodes from funnel in preprocessing. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 8 / 12

Slide 48

Slide 48 text

Pivot Sampling Recall this? Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. We have to make that true by choosing pivots well The following randomized choice works with high probability: by standard Chernoff bound arguments 1 Include each element in sample ¯ S with prob. p = 1/ log2 (N). 2 Sort the sample ¯ S. 3 Pick pivot Pi as ≈ ipN/kth smallest in ¯ S (i = 1, . . . , k − 1) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 9 / 12

Slide 49

Slide 49 text

Pivot Sampling Recall this? Assume bucket boundaries close within safety margin ±ξ = N1/2+δ to expected location. We have to make that true by choosing pivots well The following randomized choice works with high probability: by standard Chernoff bound arguments 1 Include each element in sample ¯ S with prob. p = 1/ log2 (N). 2 Sort the sample ¯ S. 3 Pick pivot Pi as ≈ ipN/kth smallest in ¯ S (i = 1, . . . , k − 1) Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 9 / 12

Slide 50

Slide 50 text

Overall Algorithm Funnelselect: 1 Sample pivots P1 , . . . , Pk−1 2 Build k-partitioner using P1 , . . . , Pk−1 3 Mark expected query free buckets & rewire their parent’s buffer to output 4 For each bucket with queries: i Sort the bucket (Funnelsort) ii Report sought elements Observations: No (top-level) recursion needed Algorithm can fail at several places sample too small, sample too large, pivots too skewed, query in expected query free bucket ⇝ restart but: with high probability, no fails ⇝ no effect on expected running time Can be augmented to produce input partitioned around sought elements in contiguous external memory (default: only return sought elements in order) Can be made to handle equal elements Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 10 / 12

Slide 51

Slide 51 text

Overall Algorithm Funnelselect: 1 Sample pivots P1 , . . . , Pk−1 2 Build k-partitioner using P1 , . . . , Pk−1 3 Mark expected query free buckets & rewire their parent’s buffer to output 4 For each bucket with queries: i Sort the bucket (Funnelsort) ii Report sought elements Observations: No (top-level) recursion needed Algorithm can fail at several places sample too small, sample too large, pivots too skewed, query in expected query free bucket ⇝ restart but: with high probability, no fails ⇝ no effect on expected running time Can be augmented to produce input partitioned around sought elements in contiguous external memory (default: only return sought elements in order) Can be made to handle equal elements Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 10 / 12

Slide 52

Slide 52 text

Outline 1 Multiple Selection 1 Multiple Selection 2 Cache Oblivious Algorithms 2 Cache Oblivious Algorithms 3 Funnelselect 3 Funnelselect 4 Conclusion 4 Conclusion Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 10 / 12

Slide 53

Slide 53 text

4 Conclusion 4 Conclusion Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 10 / 12

Slide 54

Slide 54 text

Conclusion We presented Funnelselect the first I/O-optimal, cache-oblivious multiple-selection algorithm The presented algorithm is inherently randomized, but we meanwhile found a deterministic method! (but too late for proceedings ... but stay tuned) Open Problems: 1 Funnelselect assumes a tall cache. Cannot be avoided since I/O-opt. CO sorting (⊂ multiple selection) requires it. But single selection doesn’t! ⇝ What happens in between? 2 Can online multiple selection be solved I/O-optimal cache obliviously? Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 11 / 12

Slide 55

Slide 55 text

Conclusion We presented Funnelselect the first I/O-optimal, cache-oblivious multiple-selection algorithm The presented algorithm is inherently randomized, but we meanwhile found a deterministic method! (but too late for proceedings ... but stay tuned) Open Problems: 1 Funnelselect assumes a tall cache. Cannot be avoided since I/O-opt. CO sorting (⊂ multiple selection) requires it. But single selection doesn’t! ⇝ What happens in between? 2 Can online multiple selection be solved I/O-optimal cache obliviously? Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 11 / 12

Slide 56

Slide 56 text

Conclusion We presented Funnelselect the first I/O-optimal, cache-oblivious multiple-selection algorithm The presented algorithm is inherently randomized, but we meanwhile found a deterministic method! (but too late for proceedings ... but stay tuned) Open Problems: 1 Funnelselect assumes a tall cache. Cannot be avoided since I/O-opt. CO sorting (⊂ multiple selection) requires it. But single selection doesn’t! ⇝ What happens in between? 2 Can online multiple selection be solved I/O-optimal cache obliviously? Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 11 / 12

Slide 57

Slide 57 text

We’re hiring! for Computing over compressed graph-structured data 3 year postdoc PhD student Liverpool sounds cool? → Talk to me! Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 12 / 12

Slide 58

Slide 58 text

Icons made by Freepik, Gregor Cresnar, Those Icons, Smashicons, Good Ware, Pause08, and Madebyoliver from www.flaticon.com. Vector graphics from Pressfoto, brgfx, macrovector and Jannoon028 on freepik.com Other photos from www.pixabay.com. Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 13 / 12

Slide 59

Slide 59 text

I/O Lower Bound Recall: B = q+1 i=1 ∆i log2 N ∆i with ∆i = ri − ri−1 (1 ⩽ i ⩽ q + 1, r0 = 0, rq+1 = N + 1) BI/O = q+1 i=1 ∆i B logM B N ∆i = B B log2 (M/B) ≪ (usually) B B Theorem (Lower bound) External-memory multiple selection in expectation requires Ω(BI/O ) − O N B logM/B B I/Os. Follows from general reduction (cmps bound ⇝ I/Os bound) Arge, Knudsen, Larsen: A general lower bound on the I/O-complexity of comparison-based algorithms, WADS 1993 “finish-sorting” argument no longer rigorous Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 14 / 12

Slide 60

Slide 60 text

Recap: (Lazy) Funnelsort Funnelsort is a k = 4 √ N-way Mergesort (outer recursion) each realized by recursive binary merging (inner recursion) with judiciously sized buffers for intermediate results (funnel) Simplifying assumptions: N = 22i and d = 4 (i.e., ε ⩾ 2 3 in tall cache assumption) d > 2 controls fanout (≈ N1/d-way merging) output array k2 per buffer input arrays √ k-merger √ k-mergers k-merger Recursive structure (cf. van Emde Boas trees) largest buffers in middle layer ⇝ overall space O(k5/2) = O(N5/8) Merge = fill output buffer when input buffer empty, recursively fill it ≈ I/O-optimal cache-oblivious gadget for ˆ k-way merging with ˆ k ≈ M0.3 Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 15 / 12

Slide 61

Slide 61 text

Tall Caches CPU Cache size M T a l l ... external memory – size unbounded Tall Cache Assumption: M ⩾ B1+ε think: ε = 1 ⇝ M/B ⩾ Bε ⇝ cache fits many cache lines necessary for existence of I/O-optimal (comparison-based) cache-oblivious sorting algorithms Brodal, Fagerberg: On the limits of cache-obliviousness, STOC 2003 Sebastian Wild Funnelselect: Cache-Oblivious Multiple Selection 2023-09-05 16 / 12