0x5F3759DF

0x5F3759DF ykozw

はじめに

次の関数は何をする？ float doSomething(float x) { const uint32_t t = 0x5f3759df
- (bit_cast<uint32_t>(x) >> 1); const float y = bit_cast<float>(t); return y * (1.5f - (x * 0.5f * y * y)); } ←tをfloatと解釈する ← 何らかの変換？ ↓？？？実は rsqrt（f x = 1 √𝑥 ）の(変わった)実装！

Fast inverse square root Quake III Arena(1999)にて実装されていたことで注目を浴びるベクトルの正規化で使われるつまりレイトレでも使える
現代では`RSQRTSS`の方が高速で高精度現代でも学ぶ価値がある(と思う) John Carmackが発案したわけではない "John Carmack E3 2006" by Rob Fahey is licensed under CC BY-SA 2.0.

rsqrt float rsqrt(float x) { const uint32_t t = 0x5f3759df
- (bit_cast<uint32_t>(x) >> 1); const float y = bit_cast<float>(t); return y * (1.5f - (x * 0.5f * y * y)); }

y * (1.5f - (x * 0.5f * y *
y))

y*(1.5f-(x*0.5f*y*y)) yは実はrsqrtの近似値ニュートン法で、精度を上げているニュートン法は、求根問題(f(x)=0)を解く新しい解 = 今の解 - f(今の解)/f‘(今の解) "Newton
iteration" by Yves Daoust is licensed under Public Domain.

y*(1.5f-(x*0.5f*y*y)) 次の求根問題の解が欲しい 1 √𝑥 − 𝑦 = 0 変形すると次のようになる 𝑓
𝑦 = 1 𝑦2 − 𝑥 この𝑓 𝑦 = 0となるyを得たい。微分すると次のようになる 𝑓′ 𝑦 = − 2 𝑦3 "Newton iteration" by Yves Daoust is licensed under Public Domain.

y*(1.5f-(x*0.5f*y*y)) 新しい解 = 今の解 - f(今の解)/f‘(今の解) 𝑦 − 𝑓 𝑦
𝑓′ 𝑦 = 𝑦 − 1 𝑦2 − 𝑥 − 2 𝑦3 = 𝒚( 𝟑 𝟐 − 𝒙𝒚𝟐 𝟐 ) "Newton iteration" by Yves Daoust is licensed under Public Domain.

rsqrt float rsqrt(float x) { const uint32_t t = 0x5f3759df

0x5f3759df-(bit_cast<uint32_t>(x)>>1)

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) FP32形式の復習 Sign, Exponent, fraction(Mantissa)の三つのパートからなる正に限ると次の数字を表現している 𝑥
= 2𝐸−127(1 + 1 223 𝑀) "Float example" by Dave Barber is licensed under CC BY-SA 3.0.

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 唐突にlog2 (𝑥)を取ることを考える log2 (𝑥) = log2 (2𝐸−127
1 + 1 223 𝑀 ) = log2 2𝐸−127 + log2 1 + 1 223 𝑀 = 𝑬 − 𝟏𝟐𝟕 + 𝒍𝒐𝒈𝟐 𝟏 + 𝟏 𝟐𝟐𝟑 𝑴 "Float example" by Dave Barber is licensed under CC BY-SA 3.0.

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 一般に次が成り立つ(マクローリン展開) log2 1 + 𝑥 =𝑥 −
𝑥2 2 + 𝑥3 3 − ⋯ 今回のx = 𝟏 𝟐𝟐𝟑 𝑴は[0,1] の範囲しかとらない 𝑙𝑜𝑔2 1 + 1 223 𝑀 大胆に次のように近似する log2 1 + 𝑥 ≈𝑥 + 𝜇 このとき、 𝜇を0.043にするとよく近似できる ※ 詳しくは後述

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) log2 (𝑥) = 𝐸 − 127 +
𝑙𝑜𝑔2 1 + 1 223 𝑀 ≈ 𝐸 − 127 + 1 223 𝑀 + 𝜇 = 1 223 223𝐸 + 𝑀 + 𝜇 − 127 "Float example" by Dave Barber is licensed under CC BY-SA 3.0. xのビット表現そのもの！

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 𝑥のビット表現の事を𝐵𝑥 と呼ぶことにする 𝐵𝑥 = 223𝐸 + 𝑀
すると次のように近似で表現できる log2 (𝑥) ≈ 1 223 𝐵𝑥 + 𝜇 − 127 文章にすると、「F32の値xをintとしてスケールとバイアスをつけると𝐥𝐨𝐠𝟐 (𝒙) に近似できる」となる。 "Float example" by Dave Barber is licensed under CC BY-SA 3.0.

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) rsqrtの式をBを使って変形すると次のようになる 𝑦 = 1 √𝑥 log2 (𝑦)
= log2 1 𝑥 1 223 𝐵𝑦 + 𝜇 − 127 ≈ − 1 2 log2 𝑥 1 223 𝐵𝑦 + 𝜇 − 127 ≈ − 1 2 1 223 𝐵𝑥 + 𝜇 − 127 𝑩𝒚 ≈ − 𝟏 𝟐 𝑩𝒙 + 𝟑 𝟐 𝟐𝟐𝟑(𝟏𝟐𝟕 − 𝝁) "Float example" by Dave Barber is licensed under CC BY-SA 3.0.

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 0x5f3759df - (bit_cast<uint32_t>(x) >> 1) 𝑩𝒚 ≈
− 𝟏 𝟐 𝑩𝒙 + 𝟑 𝟐 𝟐𝟐𝟑(𝟏𝟐𝟕 − 𝝁)

マジックナンバーの探索

Magic number 具体的に、𝜇を探索してみる全ての𝜇を探索する必要はなく、右の図の青の線を囲う範囲の赤い線と、オレンジの線の間に該当する𝜇さえチェックすればよい

Magic number 青の線は次の式であった 𝑦 = log2 (𝑥 + 1) これをよく近似する次の式の𝜇を探索する
𝑦 = 𝑥 + 𝜇 𝜇の取りうる範囲は次の通り [0, − log2 ln 2 − 1 ln 2 + 1] だいたい0.0から0.086くらいの値まで。先ほどのBでいうところの、 0x5F2F796C から 0x5F400000

Magic number オリジナルのマジックナンバー(0x5F3759DF)は出自不明な値といわれている [2]では、理論的に最適なマジックナンバーが提案されている相対誤差を最小化するマジックナンバー 0x5F375A86 絶対誤差を最小化するマジックナンバーはイテレーション回数別に提案されているイテレーション回数1: 0x5F37E75A イテレーション回数2:
0x5F37ADD5

Magic number 各Magic Numberに対して、 [10−10, 1010]の範囲で1万サンプルして相対誤差を出す x_values = np.logspace(-10, 10,
num=100000, dtype=np.float32) 0x5F375A80 が最小になった。最大誤差: 1.74493e-03 [2]の0x5F375A86 は最大誤差: 1.75125e-03 (全floatの値をみたらこちらが勝つ)

Magic number ニュートン法を行う回数別の相対誤差一回行うごとに精度が2桁弱程度よくなる

Magic number 指標選び「最大エラーが最小になる」指標では計算すると上記の値に近い値が得られる。「平均エラーが最小になる」指標では離れた値に(0x5F336300付近)に落ち着く値域選び「正規化対象は0から1の間だから、その間でエラーが最小のを選ぶ」みたいなことをしても、「10−10 から1010の間でエラーを最小化」としても、結果は変わらないマジックナンバーが多少違っても、ほとんど結果は変わらない一方、イテレーション数を増やすと2桁弱よくなる

他の累乗への応用

pow(x,?) 0x5f3759df - (bit_cast<uint32_t>(x) >> 1) 𝑩𝒚 ≈ − 𝟏
𝟐 𝑩𝒙 + 𝟑 𝟐 𝟐𝟐𝟑(𝟏𝟐𝟕 − 𝝁) ↑「− 𝟏 𝟐 をかけてる」 = 「 − 𝟏 𝟐 乗してる」

pow(x,?) -1/2乗以外に(おおよそ)任意の(ラフな)powが実装できる +(bit_cast<uint32_t>(x) >> 1) + M : ラフな平方根 +(bit_cast<uint32_t>(x)
>> 2) + M : ラフな4乗根 -(bit_cast<uint32_t>(x) >> 2) + M : ラフな1/4乗根 +(bit_cast<uint32_t>(x)/5*11) + M : ラフなガンマ関数 +(bit_cast<uint32_t>(x) << 8) + M : ラフな128乗かなり大雑把な近似で良く、極端な速度が求められるシチュエーションは何か？

まとめ今日は以下の事を話しましたニュートン法 FP32のフォーマット FP32のlogを取ると、近似にFP32のビット表現が出現するマジックナンバーの探索 -1/2乗以外の応用例

おわり float rsqrt(float x) { const uint32_t t = 0x5f3759df

参考文献 [1]Wikipedia contributors. (2024, September 26). Fast inverse square root.
In Wikipedia, The Free Encyclopedia. Retrieved from https://en.wikipedia.org/wiki/Fast_inverse_square_root [2]Leonid V. Moroz, Cezary J. Walczyk, Andriy Hrynchyshyn, Vijay Holimath, and Jan L. Cieslinski. 2016. Fast calculation of inverse square root with the use of magic constant - analytical approach. CoRR abs/1603.04483, (2016). Retrieved from http://arxiv.org/abs/1603.04483 [3]How the Quake engine cracked the inverse square root (2020, February 20). YouTube. Retrieved from https://www.youtube.com/watch?v=p8u_k2LIZyo

0x5F3759DF

0x5F3759DF

ykozw

More Decks by ykozw

Other Decks in Technology

Featured

Transcript

0x5F3759DF ykozw

はじめに

次の関数は何をする？ float doSomething(float x) { const uint32_t t = 0x5f3759df

Fast inverse square root Quake III Arena(1999)にて実装されていたことで注目を浴びるベクトルの正規化で使われるつまりレイトレでも使える

rsqrt float rsqrt(float x) { const uint32_t t = 0x5f3759df

y * (1.5f - (x * 0.5f * y *

y(1.5f-(x0.5fyy)) yは実はrsqrtの近似値ニュートン法で、精度を上げているニュートン法は、求根問題(f(x)=0)を解く新しい解 = 今の解 - f(今の解)/f‘(今の解) "Newton

y(1.5f-(x0.5fyy)) 次の求根問題の解が欲しい 1 √𝑥 − 𝑦 = 0 変形すると次のようになる 𝑓

y(1.5f-(x0.5fyy)) 新しい解 = 今の解 - f(今の解)/f‘(今の解) 𝑦 − 𝑓 𝑦

rsqrt float rsqrt(float x) { const uint32_t t = 0x5f3759df

0x5f3759df-(bit_cast<uint32_t>(x)>>1)

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) FP32形式の復習 Sign, Exponent, fraction(Mantissa)の三つのパートからなる正に限ると次の数字を表現している 𝑥

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 唐突にlog2 (𝑥)を取ることを考える log2 (𝑥) = log2 (2𝐸−127

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 一般に次が成り立つ(マクローリン展開) log2 1 + 𝑥 =𝑥 −

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) log2 (𝑥) = 𝐸 − 127 +

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 𝑥のビット表現の事を𝐵𝑥 と呼ぶことにする 𝐵𝑥 = 223𝐸 + 𝑀

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) rsqrtの式をBを使って変形すると次のようになる 𝑦 = 1 √𝑥 log2 (𝑦)

0x5f3759df-(bit_cast<uint32_t>(x) >> 1) 0x5f3759df - (bit_cast<uint32_t>(x) >> 1) 𝑩𝒚 ≈

マジックナンバーの探索

Magic number 具体的に、𝜇を探索してみる全ての𝜇を探索する必要はなく、右の図の青の線を囲う範囲の赤い線と、オレンジの線の間に該当する𝜇さえチェックすればよい

Magic number 青の線は次の式であった 𝑦 = log2 (𝑥 + 1) これをよく近似する次の式の𝜇を探索する

Magic number 各Magic Numberに対して、 [10−10, 1010]の範囲で1万サンプルして相対誤差を出す x_values = np.logspace(-10, 10,

Magic number ニュートン法を行う回数別の相対誤差一回行うごとに精度が2桁弱程度よくなる

他の累乗への応用

pow(x,?) 0x5f3759df - (bit_cast<uint32_t>(x) >> 1) 𝑩𝒚 ≈ − 𝟏

pow(x,?) -1/2乗以外に(おおよそ)任意の(ラフな)powが実装できる +(bit_cast<uint32_t>(x) >> 1) + M : ラフな平方根 +(bit_cast<uint32_t>(x)

まとめ今日は以下の事を話しましたニュートン法 FP32のフォーマット FP32のlogを取ると、近似にFP32のビット表現が出現するマジックナンバーの探索 -1/2乗以外の応用例

おわり float rsqrt(float x) { const uint32_t t = 0x5f3759df

参考文献 [1]Wikipedia contributors. (2024, September 26). Fast inverse square root.