Python でヒープ - heapq の概要

Python の heapq の概要

0 1 4 3 6 5 2 親より子が大きい木をヒープと言います。あるいはその逆のケースもあります。

アジェンダ 1.  データ構造 2.  heappush 追加 3.  heappop 削除 4. 
heapify 整列 5.  ２つの疑問

1. データ構造

二分木をリストで表現する •  追加するときは append するだけで済む •  普通は追加する位置を探索しないといけない – これは結構面倒…

0 2 4 1 3 6 5 リストを

0 2 4 1 3 6 5 二分木に見立てる

0 2 4 1 3 6 5 左の子は 2n +
1 2n + 1 2n + 1 2n + 1

0 2 4 1 3 6 5 右の子は 2n +
2 2n + 2 2n + 2 2n + 2

0 2 4 1 3 6 5 親は n //
2 n // 2 n // 2 n // 2 n // 2 n // 2 n // 2

2. heappush 要素を追加する葉から根へと辿ります。

1 2 4 3 6 5

1 2 4 3 6 5 0 0 を末尾に追加 heappush

1 2 4 3 6 5 0 heap invariant が満たされていない
_shi>down

1 0 4 3 6 5 2 入れ替え _shi>down

1 0 4 3 6 5 2 heap invariant が満たされていない
_shi>down

0 1 4 3 6 5 2 入れ替え _shi>down

0 1 4 3 6 5 2 完了 _shi>down

3. heappop 要素を取得する根から葉へ辿り、葉から根へ辿ります

0 1 4 3 6 5 2

1 4 3 6 5 2 根の 0 を削除 heappop

1 4 3 6 5 2 末尾の 2 を根に持ってくる。 heappop

1 4 3 6 5 2 子の要素 3, 1 のうち小さい方
1 と… _shi>up

2 4 3 6 5 1 _shi>up 入れ替え

2 4 3 6 5 1 _shi>up 子の要素 5, None
のうち小さい方 5 と…

5 4 3 6 2 1 _shi>up 入れ替え疑問１１つ前の段階で完了しているのに、
なぜこのようなことをしているのでしょうか？

5 4 3 6 2 1 _shi>down heap invariant が満たされていない

2 4 3 6 5 1 入れ替え _shi>down

2 4 3 6 5 1 入れ替える必要はない _shi>down

2 4 3 6 5 1 完了

•  _shi>down 関数 – 子から親に辿る。 – 子より親が大きければ入れ替える。 •  _shi>up 関数 – 親から子に辿る。 – 左右の子のうち小さい方と親を入れ替える。
– たとえ子よりも親が小さかったとしても入れ替える。

heappush _shi>down _shi>up heappop _shi>down heappush は _shi>down を呼び出す。 heappop
は _shi>up を呼び出す。 _shi>up は shi>down を呼び出す。

4. heapify 整列させる

6 4 2 5 3 1 0 _shi>up _shi>up _shi>up
末尾の要素から根に向けて _shi>up 関数を順番に実行していきます。

6 0 5 2 3 1 4 1. ２つのヒープを 2.
新しい要素で結合する 3. _shi>up を実行する。考え方

6 4 2 5 3 1 0 _shi>up

6 4 2 5 3 1 0

6 4 2 5 3 1 0 _shi>up

6 4 2 5 3 1 0

6 4 2 5 3 1 0 _shi>up

6 4 2 5 3 1 0

6 4 2 5 3 1 0 _shi>up

6 4 2 5 3 1 0

6 4 2 5 3 1 0 _shi>up

6 0 2 5 3 1 4

6 0 2 5 3 1 4 _shi>up

6 0 5 2 3 1 4

6 0 5 2 3 1 4 _shi>up

0 1 5 2 3 6 4

0 1 5 2 3 6 4 完了

heapify の実装について •  単純にリストの要素を全て heappush すればいいのかなと思っていたのですが、別の実装がされていました。これは疑問２とします。

heapify の実装について •  新しい要素を追加して２つのヒープをマージするということをしています。マージソートっぽいですね。

5. ２つの疑問

ソースコードのコメントに、参考になりそうなものがあったので、該当箇所を和訳しました。

pos にある要素２つの子は、すでにヒープなっています、また pos にある要素も含めてヒープにしたいと考えています。そうするために pos
にある要素の小さい方の子を葉に当たるまで（そしてその子の子などと同様に）バブリングしてから、 _si>down を使用して、もともと pos のところにあった要素を正しい位置に移動します。 The child indices of heap index pos are already heaps, and we want to make a heap at index pos too. We do this by bubbling the smaller child of 　 pos up (and so on with that child's children, etc) unNl hiOng a leaf, then using _si>down to move the oddball originally at index pos into place.

私たちは親が２つの子よりも小さい場所を見つけたらすぐにループを抜けることが *できます*、しかし、これは悪いアイディアです、多くのアルゴリズムの本でそのように書かれてはいるのですが。 We *could* break
out of the loop as soon as we ﬁnd a pos where newitem <= both its children, but turns out that's not a good idea, and despite that many books write the algorithm that way.

heap pop をしている間、最後の配列の要素は根に移動してきます、そしてその要素の値は大きくなる傾向があります、そのため根から始まる一連の値と比較しても、大抵効果がありません（= 大抵の場合、ループから早く抜け出させてはくれません）
Knuth の第 3 巻を参照してください、演習の中で、このことが説明され定量化されています。 During a heap pop, the last array element is si>ed in, and that tends to be large, so that comparing it against values starNng from the root usually doesn't pay (= usually doesn't get us out of the loop early). See Knuth, Volume 3, where this is explained and quanNﬁed in an exercise. 葉の値を根に持ってきても値は大きいので、途中で break させようとしても、かなり下まで行かなと break できないので意味がないということかな…

これらのルーチンには配列要素から「優先順位」を抽出する方法がないため、比較の回数を減らすことが重要です。なぜならコツとなるところは、カスタム比較メソッドやタプル（優先順位, 記録）を格納する配列要素に隠れている可能性があるからです。
したがって、比較は潜在的に高価であると言えます。 CuOng the # of comparisons is important, since these rouNnes have no way to extract "the priority" from an array element, so that intelligence is likely to be hiding in custom comparison methods, or in array elements storing (priority, record) tuples. Comparisons are thus potenNally expensive. int 型の比較ならそうでもないですが比較演算子を定義したユーザ定義クラスだと比較の処理が重くなる “可能性” があるということかな。

長さ 1000 のランダムな配列では、この変更によって heapify（）による比較の数は少しだけ削減でき、網羅的な heappop（）による比較の数は大
幅に削減されました、理論に沿って。これは 3 回の実行からの典型的な結果です（分散がどれほど小さいかを示すためだけに3 回）。 On random arrays of length 1000, making this change cut the number of comparisons made by heapify() a li^le, and those made by exhausNve heappop() a lot, in accord with theory. Here are typical results from 3 runs (3 just to demonstrate how small the variance is): よくわからない… コードを修正した時の修正前と修正後の話をしてるのかな「この変更」 “this change” ってなに？

Compares needed by heapify Compares needed by 1000 heappops --------------------------
-------------------------------- 1837 cut to 1663 14996 cut to 8680 1855 cut to 1659 14966 cut to 8678 1847 cut to 1660 15024 cut to 8703 よくわからない… コードを修正した時の修正前と修正後の話をしてるのかな

Building the heap by using heappush() 1000 Nmes instead required
2198, 2148, and 2219 compares: heapify（）を使用すると、より効率的です。 Building the heap by using heappush() 1000 Nmes instead required 2198, 2148, and 2219 compares: heapify() is more eﬃcient, when you can use it. よくわからない… ただ heappop より heapify で整列させた方が効率的だという文章は収穫かな…

同じリストで list.sort（）が必要とする比較の合計は8627、8627、および 8632でした（これは heapify（）と heappop（）の比較の合計と比較する必要があります）。
list.sort（）は（驚くことではありませんが！）より効率的です。 The total compares needed by list.sort() on the same lists were 8627, 8627, and 8632 (this should be compared to the sum of heapify() and heappop() compares): list.sort() is (unsurprisingly!) more eﬃcient for sorNng. Python の sort には TimSort というアルゴリズムが使われているらしいです。それと比較したらという話をしているのでしょうか。

Python でヒープ - heapq の概要

Python でヒープ - heapq の概要

More Decks by domodomodomo

Featured

Transcript