Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruby and the World Record Pi Calculation

Ruby and the World Record Pi Calculation

Emma Haruka Iwao

May 17, 2024
Tweet

More Decks by Emma Haruka Iwao

Other Decks in Technology

Transcript

  1. "Human progress in calculation has traditionally been measured by the

    number of decimal digits of π..." The Art of Computer Programming, Volume 2, Third Edition, Donald E. Knuth, 1997
  2. Pi is a popular PC benchmark • Super PI (1995),

    up to 16.7 million digits • PiFast (1997), up to 16 billion digits • y-cruncher (2009), up to 108 quadrillion (10^15) digits Time to calculate 1 billion digits https://hwbot.org/benchmark/y-cruncher_-_pi-1b/
  3. Using y-cruncher • Developed by Alexander Yee, started when he

    was in high school • Fastest program to calculate pi on a single-node computer • Written in C++ with hand optimizations for modern CPUs • You need a fast computer with a lot of memory and storage ◦ 468 TiB for 100 trillion digits ◦ Too big to fit in DRAM
  4. Storage is the bottleneck • Storage is orders of magnitude

    slower than CPU • CPU speed isn’t very important. • The 100 trillion calculation took 157 days, moving 62.8 PiB of data. • The average CPU utilization is 35%. • With infinitely fast CPU, it’d still take more than 100 days.
  5. Solving the puzzle • We’re using Google Compute Engine (GCE)

    • Maximum Persistent Disk per GCE VM: 257 TiB • Storage we need: 500 TiB • Need network storage: iSCSI ◦ Block device over TCP/IP provided by OS • Network throughput limit: 100 Gbps
  6. y-cruncher TCP/IP I/O Scheduler iSCSI Initiator Virtual NIC Cloud Network

    Virtual NIC iSCSI Target TCP/IP Storage Linux Linux Compute Node VM Storage Node VM I/O Scheduler Filesystem
  7. Configurable parameters • Filesystem: ext4, xfs, … • I/O scheduler:

    mq-deadline, none • TCP/IP parameters: buffer size, congestion algorithm • iSCSI parameters: queue depth, outstanding requests • Simultaneous multithreading • y-cruncher: bytes / seek • Cloud specific: instance type, Persistent Disk type
  8. Every tuning goes a long way If something is 1%

    faster, it could save a day in computation time. y-cruncher has benchmark mode. Each run takes 30 - 60 minutes. Can we automate it?
  9. y-cruncher’s config file FarMemoryConfig : { Framework : "disk-raid0" InterleaveWidth

    : 262144 BufferPerLane : 134217728 Checksums : "true" RawIO : "true" Lanes : [ { // Lane 0 Path : "/mnt/disk0" BufferAllocator : { Allocator : "interleave-libnuma" LockedPages : "attempt" Nodes : [1] } WorkerThreadCores : [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63] WorkerThreadPriority : 2 } { // Lane 1 Path : "/mnt/disk1" BufferAllocator : { Allocator : "interleave-libnuma" LockedPages : "attempt" Nodes : [1] } WorkerThreadCores : [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63] WorkerThreadPriority : 2 } { // Lane 2 Path : "/mnt/disk2" BufferAllocator : { Allocator : "interleave-libnuma"
  10. ERB to the rescue! • ERB is a template engine.

    • Text inside <% %> runs as Ruby code. • <%= %> replaces the block with the code output. Example: Hello, World こんにちは、世界! <%= 'Hello, World' %> こんにちは、世界!
  11. Two ways to use ERB 1. Command line erb command

    2. ERB class from code <%# hello.txt.erb %> Hello, <%= location %>! > erb location=Okinawa hello.txt.erb Hello, Okinawa! require 'erb' location = 'Okinawa' file = ERB.new(File.read('hello.txt.erb')) puts file.result
  12. require 'erb' CONFIG_FILE='y-bench.cfg' RESULTS_DIR='./bench-results' def test(count:) bytes_per_seek = 256 *

    1024 * count template = ERB.new(File.read("bench-templ.cfg.erb")) cfg_file = "#{RESULTS_DIR}/bench-#{count}.cfg" File.write(cfg_file, template.result) system("cd y-cruncher && ./y-cruncher config ../#{cfg_file} | tee ../#{RESULTS_DIR}/result-#{count}.txt") end (32..72).step(2) do |n| test(count: n) end Runs y-cruncher with 32, 34, 36, …, 72 disks automatically
  13. FarMemory : { Framework : "disk-raid0" InterleaveWidth : 262144 BufferPerLane

    : 134217728 Checksums : "true" RawIO : "true" Lanes : [ <% count.times do |i| %> { // Lane <%= i %> Path : "/mnt/disk<%= i %>" BufferAllocator : { Allocator : "interleave-libnuma" LockedPages : "attempt" Nodes : [1] } WorkerThreadCores : [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63] WorkerThreadPriority : 2 } <% end %> ] }
  14. Why ERB and not <another template engine>? • Because I’m

    a Rubyist! • Any tool would’ve worked. • Picked something I was already familiar with. • The goal is to finish benchmarking as quickly as possible. ◦ Less about learning a new tool.
  15. How to extract the numbers find . -name 'result-*.txt' -exec

    sh -c "grep -E '(Far Memory)|(Sequential)|(Threshold)|(Computation)|(Disk I/O)' {} | grep -Eo '[0-9]+\.[0-9]+ GiB/s' | sed -n '2p;4p;6p;8p;9p;10p' | grep -Eo '[0-9]+\.[0-9]+' | paste -s -d, - " \;
  16. Filter the lines • Running grep with ‘GiB/s’ as a

    marker • Some lines appear twice because y-cruncher writes the final result after a line break.
  17. Sed to get specific lines • -n: only output matching

    patterns • number: line number • p: print the current pattern space
  18. Now, we just need the numbers We left “GiB/s” as

    markers, but we don’t need them anymore. Grep again!
  19. Convert to csv with paste paste command reads lines and

    outputs with a delimiter of choice. -d: Use the specified character as a delimiter
  20. Before and after performance tuning First Config Best Config Diff

    Sequential Read (GiB/s) 3.48 11.2 312% Sequential Write (GiB/s) 6.79 8.65 127% VST Computing (GiB/s) 14.7 25.6 174% VST I/O (GiB/s) 3.50 7.52 215% Actual run time: 157 days It could’ve taken more than 300 days without any optimizations.
  21. Why not Ruby? • I’m more familiar with Linux command-line

    tools for text processing. • Trial and error is faster on command line. • It’s one-off. No maintenance needed. • Google Sheets is easy to share and collaborate with. • Didn’t want to lose more time benchmarking vs improvements.
  22. A few months later… Verifying Decimal Output: Time: 33956.481 seconds

    ( 9.432 hours ) Verifying Hexadecimal Output: Time: 33311.682 seconds ( 9.253 hours ) Start Time: Thu Oct 14 04:45:44 2021 End Time: Mon Mar 21 04:16:52 2022 Total Computation Time: 11303429.462 seconds ( 130.827 days ) Start-to-End Wall Time: 13649467.651 seconds ( 157.980 days ) CPU Utilization: 2185.38 % + 17.43 % kernel overhead Multi-core Efficiency: 34.15 % + 0.27 % kernel overhead Last Decimal Digits: Pi 4658718895 1242883556 4671544483 9873493812 1206904813 : 99,999,999,999,950 2656719174 5255431487 2142102057 7077336434 3095295560 : 100,000,000,000,000 Spot Check: Good through 50,000,000,000,000 Version: 0.7.8.9507 (Linux/18-CNL ~ Shinoa) Processor(s): Intel(R) Xeon(R) CPU @ 2.60GHz Topology: 64 threads / 64 cores / 2 sockets / 2 NUMA nodes Usable Memory: 913,099,632,640 ( 850 GiB) CPU Base Frequency: 2,599,987,648 Hz Validation File: /mnt/y-cruncher/results/Pi - 20220321-041655.txt
  23. My first RubyKaigi • My first RubyKaigi is RubyKaigi 2009

    when I was in university. • My senpai learned there that I was interested in Ruby. • Then, he suggested I attend RailsGirls Tokyo 2nd in 2013
  24. RailsGirls to RubyKaigi speaker After attending RailsGirls, I started contributing

    to RailsGirls as a coach. A few years later, I spoke about RailsGirls at RubyKaigi 2014 for the first time as a speaker. https://rubykaigi.org/2014/presentation/S-HarukaIwao/
  25. Ruby gave me the opportunities • I got my second

    job with a referral from someone I met at RubyKaigi. • When I interviewed for Google DevRel in 2017, I submitted the video of my talk at RubyKaigi 2014 as my English conference talk example. ◦ The hiring manager also valued my RailsGirls contributions.
  26. Me, pi, and DevRel • I always wanted to calculate

    pi, but didn’t have resources. • Google’s Cloud DevRel had a Pi Day tradition of showcasing Cloud technologies with pi calculations. • I suddenly had the right idea at the right place. • My manager and director supported the pi calculation project.
  27. Ruby made the pi world records possible Without Ruby, •

    I would’ve had a different career path. • I might not have broken the pi world record twice. • I am not here today.