Ruby 2.1 Walk Thru
github.com/luikore
Saturday, 11 May, 13
Slide 2
Slide 2 text
About Me
Saturday, 11 May, 13
Slide 3
Slide 3 text
标题ౘ
Saturday, 11 May, 13
Slide 4
Slide 4 text
经ৗ䋯题
Saturday, 11 May, 13
Slide 5
Slide 5 text
Organization of Slides
• How Ruby Evolves
• What’s New in Syntax
• What’s New in Core and Stdlib
• What’s New in VM
Saturday, 11 May, 13
Slide 6
Slide 6 text
This talk will cover...
• Changes 1.9 㱺 2.0 㱺 2.1
• How to Migrate
• How to not Worry and... what to worry
• A little bit Implementation details
Saturday, 11 May, 13
Slide 7
Slide 7 text
But not cover...
• Vim 7.4 will have better python support but
not ruby?
Saturday, 11 May, 13
• dev is development: -1
• rc is release candidate: -1
• p0: 0
• p370: 370
RUBY_PATCHLEVEL
Saturday, 11 May, 13
Slide 11
Slide 11 text
When a new version
becomes head,
old versions become
backport branches.
Saturday, 11 May, 13
Slide 12
Slide 12 text
• Incompatible Syntax Changes
• Incompatible ABI Changes:
• can share binary gem between minor
versions, but not major versions
• C-API change
• Add or removal of stdlib, incompatible too
• e.g. Ripper, Fiddle, Syck ...
When a New (Big) Version?
Saturday, 11 May, 13
Slide 13
Slide 13 text
When a Big Release?
Saturday, 11 May, 13
Slide 14
Slide 14 text
When a Big Release?
• Ruby Kaigi ?
Saturday, 11 May, 13
Slide 15
Slide 15 text
When a Backport Patch?
• Bug fixes
• Compatible changes that not likely to break
• new methods, classes, constants
• new optional method params
Saturday, 11 May, 13
Slide 16
Slide 16 text
What’s new & old-new?
• http://bugs.ruby-lang.org/
• /projects/ruby-193
• https://github.com/ruby/ruby
• ChangeLog
Saturday, 11 May, 13
Slide 17
Slide 17 text
Syntax
Panini ެݩલ6ੈ
纪
ఱจ
ീষ
书
, ඳड़ྃᑓ
语
ۙ
4000
构词规则
Backus Naur Form
Saturday, 11 May, 13
Pros
• Less code, less error
• Generates better rdoc
• Encourages explicit code style
• Use hash like immutable objects
Saturday, 11 May, 13
Slide 23
Slide 23 text
Cons
• More hash objects are created
Saturday, 11 May, 13
Slide 24
Slide 24 text
Complex Example
def f a, b=1, *c, d: 1, **e, &block
Saturday, 11 May, 13
Slide 25
Slide 25 text
Collateral Core Changes
• Ensure something is hash: to_h
• For example, nil.to_h #=> {}
• As you can splat anything into array with
to_ary, you can double spat anything into
hash with to_hash
Saturday, 11 May, 13
Slide 26
Slide 26 text
Refinements
Saturday, 11 May, 13
Slide 27
Slide 27 text
module HelloWorld
refine String do
def hello_world
'hello world'
end
end
end
module GoodbyeWorld
using HelloWorld
puts 'yes'.hello_world
end
Saturday, 11 May, 13
Slide 28
Slide 28 text
It doesn’t work
Saturday, 11 May, 13
Slide 29
Slide 29 text
$ ruby hello_world.rb
warning: Refinements are experimental, and
the behavior may change in future versions
of Ruby!
undefined method `using' for
GoodbyWorld:Module (NoMethodError)
Saturday, 11 May, 13
Slide 30
Slide 30 text
• Refinement still experimental in 2.1, use
ruby -W0 to suppress the warning
• using is file local instruction
Saturday, 11 May, 13
Slide 31
Slide 31 text
module HelloWorld
refine String do
def hello_world
'hello world'
end
end
end
using HelloWorld
puts 'yes'.hello_world
Saturday, 11 May, 13
Slide 32
Slide 32 text
Misc
Saturday, 11 May, 13
Slide 33
Slide 33 text
•%i
•%I
•__dir__
•__callee__
•prepend module
Saturday, 11 May, 13
String makes Ruby Ruby
•$ ruby -e 'puts "".methods.size'
Saturday, 11 May, 13
Slide 38
Slide 38 text
String makes Ruby Ruby
•$ ruby -e 'puts "".methods.size'
#=> 162
Saturday, 11 May, 13
Slide 39
Slide 39 text
Saturday, 11 May, 13
Slide 40
Slide 40 text
162
Saturday, 11 May, 13
Slide 41
Slide 41 text
Saturday, 11 May, 13
Slide 42
Slide 42 text
162!!!
Saturday, 11 May, 13
Slide 43
Slide 43 text
Which encoding?
• UTF8 (GTK):
space efficient, time slow
• UTF32 (Python):
space wasted, time efficient
• UTF16 (Windows / Coacoa / Java):
efficient but wrong ("".length)
or correct but ineffective ("".codepoints())
Saturday, 11 May, 13
Slide 44
Slide 44 text
Ruby
• Dynamic typing: data carries metadata.
Encoding is just a metadatum of string.
• Choose ANY encoding for the best of your job.
• Byte array is just a special encoding of string.
ascii-8bit == binary, but ascii-7bit/us-ascii are not.
Saturday, 11 May, 13
Slide 45
Slide 45 text
Ruby
• It tries not to get into the way if you don’t
want to handle them.
(Encoding.default_internal).
• But you may still find some incompatible
encoding problems.
2.0 CHANGE: now default internal is utf-8
༗ਓ๊ԇ Ruby Ճྃ encoding
这
样
త
问题
, ୠଖ
实
ݪຊత
问题
ੋ㟬ࣸྃ
错
తఔং, ׄೳ䋯ࣕቮ, Ճྃ
encoding ࠽ݪຊత
错误
࿐ग़དྷྃ
Saturday, 11 May, 13
Slide 46
Slide 46 text
Get a binary copy
• str.b is shortcut for
str.encode('binary')
Saturday, 11 May, 13
Slide 47
Slide 47 text
Strip invalid chars (2.1)
• Hack before 2.1
"yummy\xE2 \xF0\x9F\x8D\x94 \x9F\x8D\x94"
.encode("UTF-16BE", :undef => :replace, :invalid
=> :replace, :replace => "")
.encode("UTF-8")
.gsub("\0".encode("UTF-8"), "")
• In 2.1 (will be backported to 1.9)
"yummy\xE2 \xF0\x9F\x8D\x94 \x9F\x8D\x94".scrub ""
#=> "yummy "
Saturday, 11 May, 13
Slide 48
Slide 48 text
App migration
• Start a branch
• Use the new way
• Fix tests and merge it
Saturday, 11 May, 13
Slide 49
Slide 49 text
Lib compatibility
• Bad: if `ruby -v` !~ /2\.1/ then ...
• Good: if RUBY_VERSION < '2.1' then ...
• Better: unless ''.respond_to?(:b) then ...
Saturday, 11 May, 13
Slide 50
Slide 50 text
Thread
rails 3 ቮ
经线
ఔ҆શྃ
puma becomes popular
Saturday, 11 May, 13
Slide 51
Slide 51 text
Fix thread locals
• Bug: Thread and Fiber implementation was
originated from 1.8 green threads, and
Thread locals was actually Fiber locals...
• Fixed in 2.0
Saturday, 11 May, 13
Slide 52
Slide 52 text
New thread methods
•Thread#handle_interrupt
•Thread#backtrace_locations
Though the non-thread way is:
caller_locations
Saturday, 11 May, 13
Slide 53
Slide 53 text
Threads are real
• Since 1.9
• But there’s GVL so only one core can be
occupied in the ruby side
• In the C-extension side there’s no such
limit
• Invoke zlib inside ruby threads, you can
make use of any number of cores
Saturday, 11 May, 13
Slide 54
Slide 54 text
Real threads: how to
make use?
• zlib.c:
zstream_expand_buffer_without_gvl()
• Just don’t call into ruby’s API, then real
threads are there for you.
Saturday, 11 May, 13
New Float
representation
• The change is based on heuristic of mostly
used range of Float
• 50% or more faster on Float benchmarks
• Rails app? Probably not, depends on how
much Float objects are created
Saturday, 11 May, 13
Slide 59
Slide 59 text
Virtual instructions
• Mostly used methods have their specialized
instructions: size, length, *, /,
<<, ! ...
• Ruby method query is expensive, so an
inline method cache makes it fast for server
software.
Saturday, 11 May, 13
Slide 60
Slide 60 text
Faster Instructions?
Saturday, 11 May, 13
Slide 61
Slide 61 text
C implementation
• CRuby and MRuby
• Pros: portable to many platforms
• Cons: hard to controller what gcc /
clang / ... generates
Saturday, 11 May, 13
Slide 62
Slide 62 text
Assembly
implementation
• V8, LuaJIT, ...
• Pros: Can benefit from superscalar CPU
• Pros: Totally control generated machine
• Cons: Requires a lot of work
• Cons: Very few ones know what’s happening
Saturday, 11 May, 13
Slide 63
Slide 63 text
GC
Saturday, 11 May, 13
Slide 64
Slide 64 text
Memory management
in Ruby
• GC is the major way
• malloc / alloca / free (only visible in C-API)
• Reference counting (regexp literals)
Saturday, 11 May, 13
Slide 65
Slide 65 text
Ruby GC is...
• Mark & Sweep - not tracing, not reference
counting
• Conservative - easy & fast extension
• Incremental - (1.9)
• Bit marking - COW friendly (2.0,
backported to 1.9)
• Generational - (2.1 in progress)
Saturday, 11 May, 13
Slide 66
Slide 66 text
Why generational?
• Mark needs to iterate all objects if not
• Memory is slow and cheap, cache is fast and
expensive, it decides modern CPU
architecture with a small cache. Machine
preloads a segment of memory in cache, so if
the memory is fragmented it will be slow.
Semi-space algorithms is good for
compacting young generations.
Saturday, 11 May, 13
Slide 67
Slide 67 text
Why NOT generational?
• Every optimization has a cost
• Complex code
• API myths:
In java calling GC doesn’t actually start GC.
• Performance myths:
“xxx is faster than C” means “calling C in xxx
is slow”
Saturday, 11 May, 13
Slide 68
Slide 68 text
Generational GC in
Other VMs
Saturday, 11 May, 13
Slide 69
Slide 69 text
Generational GC in
Other VMs
• Can use different algorithms for different
generations
Saturday, 11 May, 13
Slide 70
Slide 70 text
Generational GC in
Other VMs
• Can use different algorithms for different
generations
• Can move objects but slow c-extension
(v8, rubinius, ...)
Saturday, 11 May, 13
Slide 71
Slide 71 text
Generational GC in
Other VMs
• Can use different algorithms for different
generations
• Can move objects but slow c-extension
(v8, rubinius, ...)
• Can be 3 generations (python), or infinite
(java G1GC)
Saturday, 11 May, 13
Slide 72
Slide 72 text
Generational GC in
Other VMs
• Can use different algorithms for different
generations
• Can move objects but slow c-extension
(v8, rubinius, ...)
• Can be 3 generations (python), or infinite
(java G1GC)
• Mostly require read and write barriers in C
ABI
Saturday, 11 May, 13
Slide 73
Slide 73 text
Saturday, 11 May, 13
Slide 74
Slide 74 text
Generational GC in
Ruby (just started)
Saturday, 11 May, 13
Slide 75
Slide 75 text
Generational GC in
Ruby (just started)
• 2 generations
Saturday, 11 May, 13
Slide 76
Slide 76 text
Generational GC in
Ruby (just started)
• 2 generations
• Young gen is “shiny” objects only: String,
Array, Class. If an array is used by C
extension, it becomes “shady” object
Saturday, 11 May, 13
Slide 77
Slide 77 text
Generational GC in
Ruby (just started)
• 2 generations
• Young gen is “shiny” objects only: String,
Array, Class. If an array is used by C
extension, it becomes “shady” object
• No break C-API although breaks ABI (so
need to bump a version to 2.1)
Saturday, 11 May, 13
Slide 78
Slide 78 text
Generational GC in
Ruby (just started)
• 2 generations
• Young gen is “shiny” objects only: String,
Array, Class. If an array is used by C
extension, it becomes “shady” object
• No break C-API although breaks ABI (so
need to bump a version to 2.1)
• No read barriers
Saturday, 11 May, 13
Slide 79
Slide 79 text
Challenge
• There can be leak if: C ext doesn’t notify
the GC that there’s a pointer from old gen
to young gen
• The “notification” is called write barriers
(to notify GC that there’s a reference from
old gen to new gen)
• Should add write barriers while not
breaking current gems
Saturday, 11 May, 13
Slide 80
Slide 80 text
Performance
• less mark time, but same sweep time
• If your app has ~20% GC time, then you
should get 2%~4% faster
Saturday, 11 May, 13
Slide 81
Slide 81 text
Saturday, 11 May, 13
Slide 82
Slide 82 text
• Temporally observed result is only 1% with
tuned stack:
http://meta.discourse.org/t/ruby-may-be-
getting-a-generational-gc-what-this-means-
to-you/6289
Saturday, 11 May, 13
Slide 83
Slide 83 text
New tuning param (2.1)
• RUBY_HEAP_SLOTS_GROWTH_FACTOR
(default value is 1.8)
Saturday, 11 May, 13