Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sorting Whatever* in $LANG

Sorting Whatever* in $LANG

Condensed from six 1/2-hour Perl classes I taught to Atlanta PerlMongers, then enhanced for Raku, this talk covers simple sorting for simple and complex needs, humorous history, performance landmines, and best practices.

99% of `sort()` is provided by your language, needing only a tiny bit of your own code to customize it. Come learn to write the missing bit, add this tool to your toolbox, and then I'll show you how deep the 1% rabbit hole goes. Wrong turns all mapped out; clean and fast code on tap.

Bruce Gray

July 11, 2023
Tweet

More Decks by Bruce Gray

Other Decks in Technology

Transcript

  1. Bead sort Bogo sort Bubble sort Circle sort Cocktail sort

    Comb sort Counting sort Cycle sort Gnome sort Heap sort Insertion sort Merge sort Pancake sort Patience sort Permutation sort Quick sort Radix sort Selection sort Shell sort Sleep sort Stooge sort Strand sort
  2. Bead sort Bogo sort Bubble sort Circle sort Cocktail sort

    Comb sort Counting sort Cycle sort Gnome sort Heap sort Insertion sort Merge sort Pancake sort Patience sort Permutation sort Quick sort Radix sort Selection sort Shell sort Sleep sort Stooge sort Strand sort
  3. Order of Battle • Intro and Summary • Comparators (This

    is everything) • Pitfalls (Foot-guns) • Perl sorting (Clearer/Faster) • Raku sorting (Turned up to 11) • Q&A (Bonus: when not to sort)
  4. Order of Battle • Intro and Summary • Comparators (This

    is everything) • Pitfalls (Foot-guns) • Perl sorting (Clearer/Faster) • Raku sorting (Turned up to 11) • Q&A (Bonus: when not to sort) Story Story Story
  5. Answer the Q • How will the Q be asked

    • How will we determine the answer • How we will speak (return) the answer
  6. @evens = map { $_ * 2 } 0 ..

    9; @odds = grep { $_ % 2 } 0 .. 20; # One alias: $_ @sorted = sort { $a <=> $b } (4,4,8,5,2); # Two aliases: $a and $b
  7. A comes before B Less < lt A comes after

    B More > ge A is same as B Same == eq
  8. A comes before B Less < lt A is same

    as B Same == eq A comes after B More > ge
  9. die unless '7' lt '8'; die unless '8' lt '9';

    die unless '9' lt '10'; # Died 7 8 9 10
  10. @sorted = sort { ($a lt $b) ? -1 :

    ($a gt $b) ? 1 : 0 } @names;
  11. @sorted = sort { # 1 or 2 compares ($a

    lt $b) ? -1 : ($a gt $b) ? 1 : 0 } @names;
  12. @sorted = sort { # 1 or 2 compares ($a

    lt $b) ? -1 : ($a gt $b) ? 1 : 0 } @names; @sorted = sort { $a cmp $b } @names;
  13. @sorted = sort { # 1 or 2 compares ($a

    lt $b) ? -1 : ($a gt $b) ? 1 : 0 } @names; # only 1 compare! @sorted = sort { $a cmp $b } @names;
  14. @sorted = sort { $b <=> $a } @input; @sorted

    = sort { $a <=> $b } @input; @sorted = sort { $b cmp $a } @input; @sorted = sort { $a cmp $b } @input;
  15. @sorted = sort { $b <=> $a } @input; @sorted

    = sort { $a <=> $b } @input; @sorted = sort { $b cmp $a } @input; @sorted = sort { $a cmp $b } @input; @sorted = sort @input;
  16. my %colors = ( green => 1, blue => 2

    ); @sorted = sort { $colors{$a} <=> $colors{$b} . or $a cmp $b } @input;
  17. my %colors = ( green => 1, blue => 2

    ); @sorted = sort { ($colors{$a} // 0) <=> ($colors{$b} // 0) or $a cmp $b } @input;
  18. my %colors = ( green => 1, blue => 2

    ); @sorted = sort { ($colors{$a} // 99) <=> ($colors{$b} // 99) or $a cmp $b } @input;
  19. • Python • … 2.3: In-place `sort()` • 2.4+: Streaming

    `sorted()` • Perl and Raku • All streaming, all the time
  20. @n = sort { ($a % 2) <=> ($b %

    2) or $a <=> $b } @n; @n = sort { ($a % 2) <=> ($b % 2) } sort { $a <=> $b } @n; 2 4 6 8 1 3 5 7 9
  21. perl -MO=Concise -e … | grep sort One sort: 7

    <@> sort lKS*/INPLACE ->8 Two sorts: 9 <@> sort lKS* ->a 8 <@> sort lKM/NUM ->9
  22. perl -MO=Concise -e 'print 42 - 1;' 6 <@> leave[1

    ref] vKP/REFC ->(end) 1 <0> enter v ->2 2 <;> nextstate(main 1 -e:1) v:{ ->3 5 <@> print vK ->6 3 <0> pushmark s ->4 4 <$> const(IV 41) s/FOLD ->5 -e syntax OK
  23. perl -MO=Concise -e 'print 42 - 1;' 6 <@> leave[1

    ref] vKP/REFC ->(end) 1 <0> enter v ->2 2 <;> nextstate(main 1 -e:1) v:{ ->3 5 <@> print vK ->6 3 <0> pushmark s ->4 4 <$> const(IV 41) s/FOLD ->5 -e syntax OK https://perldoc.perl.org/B::Concise
  24. my %h; my @sorted = sort { ( $h{$a} //=

    expensive($a) ) <=> ( $h{$b} //= expensive($b) ) } @stuff;
  25. time perl -wE ' my @L = `head -1000000 find_20230614.txt`;

    chomp @L; my %h; say( ( sort { ($h{$b} //= (-s($b) // 0)) <=> ($h{$a} //= (-s($a) // 0)) } @L )[0] ); ' 1m54s ==> 21s 21M `stat` ==> 1M
  26. time perl -wE ' my @L = `head -1000000 find_20230614.txt`;

    chomp @L; my %h; say( ( sort { -s($b) <=> -s($a) } @L )[0] ); '
  27. time perl -wE ' my @L = `head -1000000 find_20230614.txt`;

    chomp @L; my %h; say( ( sort { (-s($b) // 0) <=> (-s($a) // 0) } @L )[0] ); '
  28. time perl -wE ' my @L = `head -1000000 find_20230614.txt`;

    chomp @L; my %h; say( ( sort { ($h{$b} //= (-s($b) // 0)) <=> ($h{$a} //= (-s($a) // 0)) } @L )[0] ); '
  29. There are only two hard problems in computer science: 


    cache expiration, naming things, and o ff -by-one errors. https://www.martinfowler.com/bliki/TwoHardThings.html
  30. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … )
  31. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list
  32. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list becomes a 2-element array
  33. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list that gets sorted using the key becomes a 2-element array
  34. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list and changed back to a single value that gets sorted using the key becomes a 2-element array
  35. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list and changed back to a single value that gets sorted using the key becomes a 2-element array M ap
  36. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list and changed back to a single value that gets sorted using the key becomes a 2-element array M ap Sort
  37. ( Value9, Value5, … ) ( [Calc9, Value9], [Calc5, Value5],

    … ) ( [Calc1, Value1], [Calc2, Value2], … ) ( Value1, Value2, … ) Each element in the original list and changed back to a single value that gets sorted using the key becomes a 2-element array M ap Sort M ap
  38. time perl -wE ' my @L = `head -1000000 find_20230614.txt`;

    chomp @L; @L = map { $_->[1] } sort { $b->[0] <=> $a->[0] } map {[ -s($_) // 0, $_ ]} @L; say $L[0];'
  39. GRT

  40. time perl -wE ' my @L = `head -1000000 find_20230614.txt`;

    chomp @L; @L = map { unpack "x[L] a*" } sort map { pack "L a*",-s($_) // 0, $_ } @L; say $L[0];'
  41. • The more you know about your data, the more

    you can optimize. But then, the more you are hard-coding the data patterns into the design! • If you know your data is always in order, except what you just pushed on the end, which is just before or after the prior tail, you can sort 2 elements. What happens when your pattern changes? 4 6 8 8 7 Technically in-spec! Fails due to the duplicate 8.
  42. • WhateverCode* • &[cmp] generic comparator • $^a placeholder params

    • +,~ context operators • .arity introspection • cmp on lists
  43. $x = 0 + $y; $x = '' . $y;

    $x = !! $y; $x = +$y; $x = ~$y; $x = ?$y; Raku:
  44. sub f1 ($x,$y,$z) { $x + $y * $z };

    my &f2 = -> $x,$y,$z { $x + $y * $z }; my &f3 = { $^x + $^y * $^z }; my &f4 = * + * * * ; # (3 98) for &f1,&f2,&f3,&f4 -> Code $f { # (3 98) say ( $f.arity, $f.(8,9,10) ); # (3 98) } # (3 98) Whatever Star Arrow block Placeholder params Subroutine
  45. my &f3 = { $^x + $^y * $^z };

    my &f4 = * + * * * ; Whatever Star Placeholder params
  46. my &f4 = * + * * * ; say

    @nums.grep({ $_ > 9 }); Whatever Star
  47. my &f4 = * + * * * ; say

    @nums.grep({ $_ > 9 }); say @nums.grep( * > 9 ); Whatever Star
  48. my &f3 = { $^x + $^y * $^z };

    my &f4 = * + * * * ; Whatever Star Placeholder params
  49. my &f3 = { $^x + $^y * $^z };

    Placeholder params
  50. my &f3 = { $^x + $^y * $^z };

    { substr $^string, $^endpoint }
  51. my &f3 = { $^x + $^y * $^z };

    { substr $^string, $^endpoint } sub ( $endpoint, $string ) { return substr $string, $endpoint; }
  52. $n = $n + 42; $n += 42; $s =

    $s ~ 'xyz'; $s ~= 'xyz';
  53. $n = $n + 42; $n += 42; $s =

    $s ~ 'xyz'; $s ~= 'xyz'; @L = @L.sort...
  54. $n = $n + 42; $n += 42; $s =

    $s ~ 'xyz'; $s ~= 'xyz'; @L = @L.sort... @L .= sort...
  55. @L .= sort({ $^b.IO.s cmp $^a.IO.s }); @L .= sort({

    - .IO.s }); @L .= sort: -*.IO.s;
  56. Further Reading • https://perldoc.perl.org/functions/sort • Very detailed, but still mentions

    the defunct `use sort` pragma. • https://perldoc.perl.org/sort • The old `use sort` pragma. • https://rosettacode.org/wiki/Category:Sorting_Algorithms • All the sorts, in all the languages!!!! • https://blogs.perl.org/users/bruce_gray/2023/02/twc-205- exclusive-third-or- fi rst.html • Yves Orton's comment on Heaps - super-fast for Priority Queues!
  57. • Just Maximum, Minimum? - use a O(N) linear scan

    • Perl: use List::Util qw<max min maxstr minstr> Raku: .max / .min / .maxpairs / .minpairs • Repeatedly need max/min after each array update? • Use Heap or Priority Queue • Always needs to be in-order • Really depends on more details of your needs. Could be a module, or may need a external DB.
  58. Copyright Information: Images and Video • Camelia • © 2009

    by Larry Wall
 http://github.com/perl6/mu/raw/master/misc/camelia.txt • Sorts 2018 - Color Circle (The video before the fi rst slide) • © 2018 by w0rthy
 https://www.youtube.com/watch?v=sVYtGyPiGik
 https://github.com/w0rthy
  59. Copyright Information: This Talk This work is licensed under a

    Creative Commons Attribution 4.0 International License. CC BY https://creativecommons.org/licenses/by/4.0/ (email me for the original Apple Keynote .key fi le)
  60. History • v 1.00 2023-07-11
 Presented fi nal version to

    The Perl and Raku Conference in Toronto, ON, CA

  61. le ge you would core-dump non-identical ties are needed before

    you care about tie-breakers? last-letter.