Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruby 2.1 Walk Thru (title bait)

Zete
May 11, 2013

Ruby 2.1 Walk Thru (title bait)

A talk on http://segmentfault.com/e/hz-ruby-salon-1

Errata:
- Python strings are different since 3.3 (can be either 1/2/4 bytes aligned)
- For better understanding of versioning: http://semver.org/

Zete

May 11, 2013
Tweet

More Decks by Zete

Other Decks in Programming

Transcript

  1. Organization of Slides • How Ruby Evolves • What’s New

    in Syntax • What’s New in Core and Stdlib • What’s New in VM Saturday, 11 May, 13
  2. This talk will cover... • Changes 1.9 㱺 2.0 㱺

    2.1 • How to Migrate • How to not Worry and... what to worry • A little bit Implementation details Saturday, 11 May, 13
  3. But not cover... • Vim 7.4 will have better python

    support but not ruby? Saturday, 11 May, 13
  4. • Development 1.8 㱺 1.9 㱺 2.0 㱺 2.1 •

    Backport 1.8 ⾨ 1.9 ⾨ 2.0 ⾨ 2.1 Saturday, 11 May, 13
  5. • dev is development: -1 • rc is release candidate:

    -1 • p0: 0 • p370: 370 RUBY_PATCHLEVEL Saturday, 11 May, 13
  6. • Incompatible Syntax Changes • Incompatible ABI Changes: • can

    share binary gem between minor versions, but not major versions • C-API change • Add or removal of stdlib, incompatible too • e.g. Ripper, Fiddle, Syck ... When a New (Big) Version? Saturday, 11 May, 13
  7. When a Backport Patch? • Bug fixes • Compatible changes

    that not likely to break • new methods, classes, constants • new optional method params Saturday, 11 May, 13
  8. Syntax Panini ެݩલ6ੈ 纪 ఱ৓จ ീষ 书 , ඳड़ྃᑓ 语

    ۙ 4000 构词规则 Backus Naur Form Saturday, 11 May, 13
  9. Syntax \p{Devanagari} Panini ެݩલ6ੈ 纪 ఱ৓จ ീষ 书 , ඳड़ྃᑓ

    语 ۙ 4000 构词规则 Backus Naur Form Saturday, 11 May, 13
  10. def f before: "<", after: ">" before + after end

    f after: ")" #=> "<)" f befoer: 3 #=> ArgumentError: unkown keyword "befoer" Saturday, 11 May, 13
  11. h1 = {a: 1} h3 = {c: 3} {**h1, b:

    2, **h3} #=> {a: 1, b: 2, c: 3} Saturday, 11 May, 13
  12. Pros • Less code, less error • Generates better rdoc

    • Encourages explicit code style • Use hash like immutable objects Saturday, 11 May, 13
  13. Complex Example def f a, b=1, *c, d: 1, **e,

    &block Saturday, 11 May, 13
  14. Collateral Core Changes • Ensure something is hash: to_h •

    For example, nil.to_h #=> {} • As you can splat anything into array with to_ary, you can double spat anything into hash with to_hash Saturday, 11 May, 13
  15. module HelloWorld refine String do def hello_world 'hello world' end

    end end module GoodbyeWorld using HelloWorld puts 'yes'.hello_world end Saturday, 11 May, 13
  16. $ ruby hello_world.rb warning: Refinements are experimental, and the behavior

    may change in future versions of Ruby! undefined method `using' for GoodbyWorld:Module (NoMethodError) Saturday, 11 May, 13
  17. • Refinement still experimental in 2.1, use ruby -W0 to

    suppress the warning • using is file local instruction Saturday, 11 May, 13
  18. module HelloWorld refine String do def hello_world 'hello world' end

    end end using HelloWorld puts 'yes'.hello_world Saturday, 11 May, 13
  19. Core & Stdlib ༥ 盐钍 ൓ 应 ଯՄೳՄҎղႊೳݯةص 1.࿨ 铀

    ൓ 应 ଯෆಉ, ൓ 应 ෺ 质 ੋྲྀೖ core಺త,೗Ռ 发 ੜԹ౓ 过 ߴҙ֎ձࣗ 动 ྲྀग़҆શ༰ثத 2. 钍 త஍䐺ؚྔੋ 铀 త޷႓ഒ,ࣕ׌୞ ༗Ұ䝅ಉҐૉ, 铀 ༗u235࿨u238,ࣕ୞ ༗ؚྔඇৗখతU235ՄҎ྾ 变 , ෼㩂 u235኷ຑ 烦 Saturday, 11 May, 13
  20. Which encoding? • UTF8 (GTK): space efficient, time slow •

    UTF32 (Python): space wasted, time efficient • UTF16 (Windows / Coacoa / Java): efficient but wrong ("".length) or correct but ineffective ("".codepoints()) Saturday, 11 May, 13
  21. Ruby • Dynamic typing: data carries metadata. Encoding is just

    a metadatum of string. • Choose ANY encoding for the best of your job. • Byte array is just a special encoding of string. ascii-8bit == binary, but ascii-7bit/us-ascii are not. Saturday, 11 May, 13
  22. Ruby • It tries not to get into the way

    if you don’t want to handle them. (Encoding.default_internal). • But you may still find some incompatible encoding problems. 2.0 CHANGE: now default internal is utf-8 ༗ਓ๊ԇ Ruby Ճྃ encoding ଄੒ 这 样 త 问题 , ୠଖ 实 ݪຊత 问题 ੋ㟬ࣸྃ 错 తఔং, ׄ޷ೳ䋯ࣕቮ, Ճྃ encoding ࠽೺ݪຊత 错误 ๫࿐ग़དྷྃ Saturday, 11 May, 13
  23. Strip invalid chars (2.1) • Hack before 2.1 "yummy\xE2 \xF0\x9F\x8D\x94

    \x9F\x8D\x94" .encode("UTF-16BE", :undef => :replace, :invalid => :replace, :replace => "") .encode("UTF-8") .gsub("\0".encode("UTF-8"), "") • In 2.1 (will be backported to 1.9) "yummy\xE2 \xF0\x9F\x8D\x94 \x9F\x8D\x94".scrub "" #=> "yummy " Saturday, 11 May, 13
  24. App migration • Start a branch • Use the new

    way • Fix tests and merge it Saturday, 11 May, 13
  25. Lib compatibility • Bad: if `ruby -v` !~ /2\.1/ then

    ... • Good: if RUBY_VERSION < '2.1' then ... • Better: unless ''.respond_to?(:b) then ... Saturday, 11 May, 13
  26. Fix thread locals • Bug: Thread and Fiber implementation was

    originated from 1.8 green threads, and Thread locals was actually Fiber locals... • Fixed in 2.0 Saturday, 11 May, 13
  27. Threads are real • Since 1.9 • But there’s GVL

    so only one core can be occupied in the ruby side • In the C-extension side there’s no such limit • Invoke zlib inside ruby threads, you can make use of any number of cores Saturday, 11 May, 13
  28. Real threads: how to make use? • zlib.c: zstream_expand_buffer_without_gvl() •

    Just don’t call into ruby’s API, then real threads are there for you. Saturday, 11 May, 13
  29. # generate CSV {|stdout| stdout << %w[hello world] } CSV("".encode

    'GBK') {|string| string << %w[㟬޷ ੈք] } # parse string CSV.parse('hello, world') # parse file CSV.read('hello.csv') Saturday, 11 May, 13
  30. New Float representation • The change is based on heuristic

    of mostly used range of Float • 50% or more faster on Float benchmarks • Rails app? Probably not, depends on how much Float objects are created Saturday, 11 May, 13
  31. Virtual instructions • Mostly used methods have their specialized instructions:

    size, length, *, /, <<, ! ... • Ruby method query is expensive, so an inline method cache makes it fast for server software. Saturday, 11 May, 13
  32. C implementation • CRuby and MRuby • Pros: portable to

    many platforms • Cons: hard to controller what gcc / clang / ... generates Saturday, 11 May, 13
  33. Assembly implementation • V8, LuaJIT, ... • Pros: Can benefit

    from superscalar CPU • Pros: Totally control generated machine • Cons: Requires a lot of work • Cons: Very few ones know what’s happening Saturday, 11 May, 13
  34. Memory management in Ruby • GC is the major way

    • malloc / alloca / free (only visible in C-API) • Reference counting (regexp literals) Saturday, 11 May, 13
  35. Ruby GC is... • Mark & Sweep - not tracing,

    not reference counting • Conservative - easy & fast extension • Incremental - (1.9) • Bit marking - COW friendly (2.0, backported to 1.9) • Generational - (2.1 in progress) Saturday, 11 May, 13
  36. Why generational? • Mark needs to iterate all objects if

    not • Memory is slow and cheap, cache is fast and expensive, it decides modern CPU architecture with a small cache. Machine preloads a segment of memory in cache, so if the memory is fragmented it will be slow. Semi-space algorithms is good for compacting young generations. Saturday, 11 May, 13
  37. Why NOT generational? • Every optimization has a cost •

    Complex code • API myths: In java calling GC doesn’t actually start GC. • Performance myths: “xxx is faster than C” means “calling C in xxx is slow” Saturday, 11 May, 13
  38. Generational GC in Other VMs • Can use different algorithms

    for different generations Saturday, 11 May, 13
  39. Generational GC in Other VMs • Can use different algorithms

    for different generations • Can move objects but slow c-extension (v8, rubinius, ...) Saturday, 11 May, 13
  40. Generational GC in Other VMs • Can use different algorithms

    for different generations • Can move objects but slow c-extension (v8, rubinius, ...) • Can be 3 generations (python), or infinite (java G1GC) Saturday, 11 May, 13
  41. Generational GC in Other VMs • Can use different algorithms

    for different generations • Can move objects but slow c-extension (v8, rubinius, ...) • Can be 3 generations (python), or infinite (java G1GC) • Mostly require read and write barriers in C ABI Saturday, 11 May, 13
  42. Generational GC in Ruby (just started) • 2 generations •

    Young gen is “shiny” objects only: String, Array, Class. If an array is used by C extension, it becomes “shady” object Saturday, 11 May, 13
  43. Generational GC in Ruby (just started) • 2 generations •

    Young gen is “shiny” objects only: String, Array, Class. If an array is used by C extension, it becomes “shady” object • No break C-API although breaks ABI (so need to bump a version to 2.1) Saturday, 11 May, 13
  44. Generational GC in Ruby (just started) • 2 generations •

    Young gen is “shiny” objects only: String, Array, Class. If an array is used by C extension, it becomes “shady” object • No break C-API although breaks ABI (so need to bump a version to 2.1) • No read barriers Saturday, 11 May, 13
  45. Challenge • There can be leak if: C ext doesn’t

    notify the GC that there’s a pointer from old gen to young gen • The “notification” is called write barriers (to notify GC that there’s a reference from old gen to new gen) • Should add write barriers while not breaking current gems Saturday, 11 May, 13
  46. Performance • less mark time, but same sweep time •

    If your app has ~20% GC time, then you should get 2%~4% faster Saturday, 11 May, 13
  47. • Temporally observed result is only 1% with tuned stack:

    http://meta.discourse.org/t/ruby-may-be- getting-a-generational-gc-what-this-means- to-you/6289 Saturday, 11 May, 13
  48. • Panini rules: http://sanskrit.sai.uni- heidelberg.de/Panini/HTML/ list_all_rules.html • Ruby 2.0 in

    detail: http://globaldev.co.uk/ 2013/03/ruby-2-0-0-in-detail/ • Introducing generational GC: http:// bugs.ruby-lang.org/issues/8339 Saturday, 11 May, 13