Upgrade to Pro — share decks privately, control downloads, hide ads and more …

In Search of the Perfect Global Interpreter Lock

In Search of the Perfect Global Interpreter Lock

Conference presentation. RuPy 2011. Poznan, Poland. Conference video at https://www.youtube.com/watch?v=5jbG7UKT1l4

David Beazley

October 15, 2011
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    In Search of the Perfect
    Global Interpreter Lock
    1
    David Beazley
    http://www.dabeaz.com
    @dabeaz
    Presented at RuPy 2011
    Poznan, Poland
    October 15, 2011

    View Slide

  2. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Introduction
    • As many programmers know, Python and Ruby
    feature a Global Interpreter Lock (GIL)
    • More precise: CPython and MRI
    • It limits thread performance on multicore
    • Theoretically restricts code to a single CPU
    2

    View Slide

  3. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    An Experiment
    • Consider a trivial CPU-bound function
    def countdown(n):
    while n > 0:
    n -= 1
    3
    • Run it once with a lot of work
    COUNT = 100000000 # 100 million
    countdown(COUNT)
    • Now, divide the work across two threads
    t1 = Thread(target=count,args=(COUNT//2,))
    t2 = Thread(target=count,args=(COUNT//2,))
    t1.start(); t2.start()
    t1.join(); t2.join()

    View Slide

  4. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    An Experiment
    • Some Ruby
    def countdown(n)
    while n > 0
    n -= 1
    end
    end
    4
    • Sequential
    COUNT = 100000000 # 100 million
    countdown(COUNT)
    • Subdivided across threads
    t1 = Thread.new { countdown(COUNT/2) }
    t2 = Thread.new { countdown(COUNT/2) }
    t1.join
    t2.join

    View Slide

  5. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Expectations
    • Sequential and threaded versions perform the
    same amount of work (same # calculations)
    • There is the GIL... so no parallelism
    • Performance should be about the same
    5

    View Slide

  6. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    6
    • Ruby 1.9 on OS-X (4 cores)
    Sequential
    Threaded (2 threads)
    : 2.46s
    : 2.55s (~ same)

    View Slide

  7. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Python 2.7
    7
    Sequential
    Threaded (2 threads)
    : 6.12s
    : 9.28s (1.5x slower!)
    • Ruby 1.9 on OS-X (4 cores)
    Sequential
    Threaded (2 threads)
    : 2.46s
    : 2.55s (~ same)

    View Slide

  8. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Python 2.7
    8
    Sequential
    Threaded (2 threads)
    : 6.12s
    : 9.28s (1.5x slower!)
    • Ruby 1.9 on OS-X (4 cores)
    Sequential
    Threaded (2 threads)
    : 2.46s
    : 2.55s (~ same)
    • Question: Why does it get slower in Python?

    View Slide

  9. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    9
    • Ruby 1.9 on Windows Server 2008 (2 cores)
    Sequential
    Threaded (2 threads)
    : 3.32s
    : 3.45s (~ same)

    View Slide

  10. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Python 2.7
    10
    Sequential
    Threaded (2 threads)
    : 6.9s
    : 63.0s (9.1x slower!)
    • Ruby 1.9 on Windows Server 2008 (2 cores)
    Sequential
    Threaded (2 threads)
    : 3.32s
    : 3.45s (~ same)

    View Slide

  11. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Python 2.7
    11
    Sequential
    Threaded (2 threads)
    : 6.9s
    : 63.0s (9.1x slower!)
    • Ruby 1.9 on Windows Server 2008 (2 cores)
    Sequential
    Threaded (2 threads)
    : 3.32s
    : 3.45s (~ same)
    • Why does it get that much slower on Windows?

    View Slide

  12. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Experiment: Messaging
    12
    • A request/reply server for size-prefixed messages
    Server
    Client
    • Each message: a size header + payload
    • Similar: ZeroMQ

    View Slide

  13. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    An Experiment: Messaging
    13
    • A simple test - message echo (pseudocode)
    def client(nummsg,msg):
    while nummsg > 0:
    send(msg)
    resp = recv()
    sleep(0.001)
    nummsg -= 1
    def server():
    while True:
    msg = recv()
    send(msg)

    View Slide

  14. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    An Experiment: Messaging
    14
    • A simple test - message echo (pseudocode)
    def client(nummsg,msg):
    while nummsg > 0:
    send(msg)
    resp = recv()
    sleep(0.001)
    nummsg -= 1
    def server():
    while True:
    msg = recv()
    send(msg)
    • To be less evil, it's throttled (<1000 msg/sec)
    • Not a messaging stress test

    View Slide

  15. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    An Experiment: Messaging
    15
    • A test: send/receive 1000 8K messages
    • Scenario 1: Unloaded server
    Server
    Client
    • Scenario 2 : Server competing with one CPU-thread
    Server
    Client
    CPU-Thread

    View Slide

  16. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Messaging with no threads (OS-X, 4 cores)
    16
    C
    Python 2.7
    Ruby 1.9
    : 1.26s
    : 1.29s
    : 1.29s

    View Slide

  17. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Messaging with no threads (OS-X, 4 cores)
    17
    C
    Python 2.7
    Ruby 1.9
    : 1.26s
    : 1.29s
    : 1.29s
    • Messaging with one CPU-bound thread*
    C
    Python 2.7
    Ruby 1.9
    : 1.16s (~8% faster!?)
    : 12.3s (10x slower)
    : 42.0s (33x slower)
    • Hmmm. Curious.
    * On Ruby, the CPU-bound thread
    was also given lower priority

    View Slide

  18. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Messaging with no threads (Linux, 8 CPUs)
    18
    C
    Python 2.7
    Ruby 1.9
    : 1.13s
    : 1.18s
    : 1.18s

    View Slide

  19. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Messaging with no threads (Linux, 8 CPUs)
    19
    C
    Python 2.7
    Ruby 1.9
    : 1.13s
    : 1.18s
    : 1.18s
    • Messaging with one CPU-bound thread
    C
    Python 2.7
    Ruby 1.9
    : 1.11s (same)
    : 1.60s (1.4x slower) - better
    : 5839.4s (~5000x slower) - worse!

    View Slide

  20. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Results
    • Messaging with no threads (Linux, 8 CPUs)
    20
    C
    Python 2.7
    Ruby 1.9
    : 1.13s
    : 1.18s
    : 1.18s
    • Messaging with one CPU-bound thread
    C
    Python 2.7
    Ruby 1.9
    : 1.11s (same)
    : 1.60s (1.4x slower) - better
    : 5839.4s (~5000x slower) - worse!
    • 5000x slower? Really? Why?

    View Slide

  21. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    The Mystery Deepens
    • Disable all but one CPU core
    21
    Python 2.7 (4 cores+hyperthreading)
    Python 2.7 (1 core)
    : 9.28s
    : 7.9s (faster!)
    • Messaging with one CPU-bound thread
    Ruby 1.9 (4 cores+hyperthreading)
    Ruby 1.9 (1 core)
    : 42.0s
    : 10.5s (much faster!)
    • ?!?!?!?!?!?
    • CPU-bound threads (OS-X)

    View Slide

  22. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Better is Worse
    • Change software versions
    22
    Python 2.7 (Messaging)
    Python 3.2 (Messaging)
    : 12.3s
    : 20.1s (1.6x slower)
    • Let's downgrade to Ruby 1.8 (Linux)
    Ruby 1.9 (Messaging)
    Ruby 1.8.7 (Messaging)
    : 42.0
    : 10.0s (4x faster)
    • Let's upgrade to Python 3 (Linux)
    • So much for progress (sigh)

    View Slide

  23. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    What's Happening?
    • The GIL does far more than limit cores
    • It can make performance much worse
    • Better performance by turning off cores?
    • 5000x performance hit on Linux?
    • Why?
    23

    View Slide

  24. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Why You Might Care
    • Must you abandon Python/Ruby for concurrency?
    • Having threads restricted to one CPU core might
    be okay if it were sane
    • Analogy: A multitasking operating system
    (e.g., Linux) runs fine on a single CPU
    • Plus, threads get used a lot behind the scenes
    (even in thread alternatives, e.g., async)
    24

    View Slide

  25. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Why I Care
    • It's an interesting little systems problem
    • How do you make a better GIL?
    • It's fun.
    25

    View Slide

  26. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Some Background
    • I have been discussing some of these issues
    in the Python community since 2009
    26
    http://www.dabeaz.com/GIL
    • I'm less familiar with Ruby, but I've looked at
    its GIL implementation and experimented
    • Very interested in commonalities/differences

    View Slide

  27. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    27
    A Tale of Two GILs

    View Slide

  28. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Implementation
    • System threads
    (e.g., pthreads)
    • Managed by OS
    • Concurrent
    execution of the
    Python interpreter
    (written in C)
    28
    • System threads
    (e.g., pthreads)
    • Managed by OS
    • Concurrent
    execution of the
    Ruby VM
    (written in C)

    View Slide

  29. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Alas, the GIL
    • Parallel execution is forbidden
    • There is a "global interpreter lock"
    • The GIL ensures that only one thread runs in
    the interpreter at once
    • Simplifies many low-level details (memory
    management, callouts to C extensions, etc.)
    29

    View Slide

  30. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    GIL Implementation
    30
    int gil_locked = 0;
    mutex_t gil_mutex;
    cond_t gil_cond;
    void gil_acquire() {
    mutex_lock(gil_mutex);
    while (gil_locked)
    cond_wait(gil_cond);
    gil_locked = 1;
    mutex_unlock(gil_mutex);
    }
    void gil_release() {
    mutex_lock(gil_mutex);
    gil_locked = 0;
    cond_notify();
    mutex_unlock(gil_mutex);
    }
    mutex_t gil;
    void gil_acquire() {
    mutex_lock(gil);
    }
    void gil_release() {
    mutex_unlock(gil);
    }
    Simple mutex lock
    Condition variable

    View Slide

  31. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Execution Model
    • The GIL results in cooperative multitasking
    31
    Thread 1
    Thread 2
    Thread 3
    block block block block block
    • When a thread is running, it holds the GIL
    • GIL released on blocking (e.g., I/O operations)
    run
    run
    run
    run
    run
    release
    GIL
    acquire
    GIL
    release
    GIL
    acquire
    GIL

    View Slide

  32. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Threads for I/O
    • For I/O it works great
    • GIL is never held very long
    • Most threads just sit around sleeping
    • Life is good
    32

    View Slide

  33. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Threads for Computation
    • You may actually want to compute something!
    • Fibonacci numbers
    • Image/audio processing
    • Parsing
    • The CPU will be busy
    • And it won't give up the GIL on its own
    33

    View Slide

  34. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    CPU-Bound Switching
    • Releases and
    reacquires the GIL
    every 100 "ticks"
    • 1 Tick ~= 1 interpreter
    instruction
    34
    • Background thread
    generates a timer
    interrupt every 10ms
    • GIL released and
    reacquired by current
    thread on interrupt

    View Slide

  35. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Python Thread Switching
    35
    CPU Bound
    Thread
    Run 100
    ticks
    Run 100
    ticks
    Run 100
    ticks
    • Every 100 VM instructions, GIL is dropped,
    allowing other threads to run if they want
    • Not time based--switching interval depends on
    kind of instructions executed
    release
    acquire
    release
    acquire
    release
    acquire

    View Slide

  36. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Ruby Thread Switching
    36
    CPU Bound
    Thread Run Run
    Timer
    Thread
    Timer (10ms) Timer (10ms)
    release
    acquire
    release
    acquire
    • Loosely mimics the time-slice of the OS
    • Every 10ms, GIL is released/acquired

    View Slide

  37. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    A Common Theme
    • Both Python and Ruby have C code like this:
    37
    void execute() {
    while (inst = next_instruction()) {
    // Run the VM instruction
    ...
    if (must_release_gil) {
    GIL_release();
    /* Other threads may run now */
    GIL_acquire();
    }
    }
    }
    • Exact details vary, but concept is the same
    • Each thread has periodic release/acquire in the
    VM to allow other threads to run

    View Slide

  38. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Question
    38
    if (must_release_gil) {
    GIL_release();
    /* Other threads may run now */
    GIL_acquire();
    }
    • Short answer: Everything!
    • What can go wrong with this bit of code?

    View Slide

  39. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    39
    Pathology

    View Slide

  40. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Switching
    • Suppose you have two threads
    40
    • Thread 1 : Running
    • Thread 2 : Ready (Waiting for GIL)
    Thread 1
    Running
    Thread 2 READY

    View Slide

  41. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Switching
    • Easy case : Thread 1 performs I/O (read/write)
    41
    • Thread 1 : Releases GIL and blocks for I/O
    • Thread 2 : Gets scheduled, starts running
    Thread 1
    Running
    Thread 2 READY
    I/O
    pthreads/OS
    schedule
    Running
    BLOCKED
    acquire GIL
    release
    GIL

    View Slide

  42. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Switching
    • Tricky case : Thread 1 runs until preempted
    42
    Thread 1
    Running
    Thread 2 READY
    preem
    pt
    pthreads/OS
    release
    GIL
    Which thread runs?
    ???
    ???

    View Slide

  43. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Switching
    • You might expect that Thread 2 will run
    43
    • But you assume the GIL plays nice...
    Thread 1
    Running
    Thread 2 READY
    preem
    pt
    pthreads/OS
    release
    GIL
    Running
    schedule
    READY
    acquire
    GIL

    View Slide

  44. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thread Switching
    • What might actually happen on multicore
    44
    Thread 1
    Running
    Thread 2 READY
    preem
    pt
    pthreads/OS
    release
    GIL
    schedule
    Running
    acquire
    GIL
    fails (GIL locked)
    READY
    • Both threads attempt to run simultaneously
    • ... but only one will succeed (depends on timing)

    View Slide

  45. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Fallacy
    45
    if (must_release_gil) {
    GIL_release();
    /* Other threads may run now */
    GIL_acquire();
    }
    • This code doesn't actually switch threads
    • It might switch threads, but it depends
    • What operating system
    • # cores
    • Lock scheduling policy (if any)

    View Slide

  46. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Fallacy
    46
    if (must_release_gil) {
    GIL_release();
    sleep(0);
    /* Other threads may run now */
    GIL_acquire();
    }
    • This doesn't force switching (sleeping)
    • It might switch threads, but it depends
    • What operating system
    • # cores
    • Lock scheduling policy (if any)

    View Slide

  47. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Fallacy
    47
    if (must_release_gil) {
    GIL_release();
    sched_yield()
    /* Other threads may run now */
    GIL_acquire();
    }
    • Neither does this (calling the scheduler)
    • It might switch threads, but it depends
    • What operating system
    • # cores
    • Lock scheduling policy (if any)

    View Slide

  48. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    A Conflict
    • There are conflicting goals
    • Python/Ruby - wants to run on a single
    CPU, but doesn't want to do thread
    scheduling (i.e., let the OS do it).
    • OS - "Oooh. Multiple cores."
    Schedules as many runnable tasks as
    possible at any instant
    • Result: Threads fight with each other
    48

    View Slide

  49. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Multicore GIL Battle
    49
    • Python 2.7 on OS-X (4 cores)
    Sequential
    Threaded (2 threads)
    : 6.12s
    : 9.28s (1.5x slower!)
    Thread 1
    100 ticks
    preem
    pt
    preem
    pt
    preem
    pt
    100 ticks
    Thread 2
    ...
    release
    schedule
    READY
    Eventually...
    READY
    release
    run
    pthreads/OS
    acquire acquire
    fail
    READY
    schedule fail
    READY
    • Millions of failed GIL acquisitions

    View Slide

  50. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Multicore GIL Battle
    50
    • You can see it! (2 CPU-bound threads)
    Why >100%?
    • Comment: In Python, it's very rapid
    • GIL is released every few microseconds!

    View Slide

  51. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    I/O Handling
    • If there is a CPU-bound thread, I/O bound
    threads have a hard time getting the GIL
    51
    Thread 1 (CPU 1) Thread 2 (CPU 2)
    Network Packet
    Acquire GIL (fails)
    run
    Acquire GIL (fails)
    Acquire GIL (fails)
    Acquire GIL (success)
    preempt
    preempt
    preempt
    preempt
    run
    sleep
    Might repeat
    100s-1000s of times
    run
    run
    run

    View Slide

  52. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Messaging Pathology
    52
    • Messaging on Linux (8 Cores)
    Ruby 1.9 (no threads)
    Ruby 1.9 (1 CPU thread)
    : 1.18s
    : 5839.4s
    • Locks in Linux have no fairness
    • Consequence: Really hard to steal the GIL
    • And Ruby only retries every 10ms

    View Slide

  53. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Let's Talk Fairness
    53
    • Fair-locking means that locks have some notion
    of priorities, arrival order, queuing, etc.
    Lock t1 t2 t3 t4 t5
    waiting
    t0
    running
    Lock t2 t3 t4 t5 t0
    waiting
    t1
    running
    release
    • Releasing means you go to end of line

    View Slide

  54. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Effect of Fair-Locking
    54
    • Ruby 1.9 (multiple cores)
    Messages + 1 CPU Thread (OS-X)
    Messages + 1 CPU Thread (Linux)
    • Question: Which one uses fair locking?
    : 42.0s
    : 5839.4s

    View Slide

  55. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Effect of Fair-Locking
    55
    • Ruby 1.9 (multiple cores)
    Messages + 1 CPU Thread (OS-X)
    Messages + 1 CPU Thread (Linux)
    • Benefit : I/O threads get their turn (yay!)
    : 42.0s (Fair)
    : 5839.4s

    View Slide

  56. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Effect of Fair-Locking
    56
    • Ruby 1.9 (multiple cores)
    Messages + 1 CPU Thread (OS-X)
    Messages + 1 CPU Thread (Linux)
    • Benefit : I/O threads get their turn (yay!)
    : 42.0s (Fair)
    : 5839.4s
    • Python 2.7 (multiple cores)
    2 CPU-Bound Threads (OS-X)
    2 CPU-Bound Threads (Windows)
    : 9.28s
    : 63.0s
    • Question: Which one uses fair-locking?

    View Slide

  57. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Effect of Fair-Locking
    57
    • Ruby 1.9 (multiple cores)
    Messages + 1 CPU Thread (OS-X)
    Messages + 1 CPU Thread (Linux)
    • Benefit : I/O threads get their turn (yay!)
    : 42.0s (Fair)
    : 5839.4s
    • Python 2.7 (multiple cores)
    2 CPU-Bound Threads (OS-X)
    2 CPU-Bound Threads (Windows)
    : 9.28s
    : 63.0s (Fair)
    • Problem: Too much context switching

    View Slide

  58. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Fair-Locking - Bah!
    58
    • In reality, you don't want fairness
    • Messaging Revisited (OS X, 4 Cores)
    Ruby 1.9 (No Threads)
    Ruby 1.9 (1 CPU-Bound thread)
    : 1.29s
    : 42.0s (33x slower)
    • Why is it still 33x slower?
    • Answer: Fair locking! (and convoying)

    View Slide

  59. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Messaging Revisited
    59
    • Go back to the messaging server
    def server():
    while True:
    msg = recv()
    send(msg)

    View Slide

  60. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Messaging Revisited
    60
    • The actual implementation (size-prefixed messages)
    def server():
    while True:
    size = recv(4)
    msg = recv(size)
    send(size)
    send(msg)

    View Slide

  61. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Performance Explained
    61
    • What actually happens under the covers
    def server():
    while True:
    size = recv(4)
    msg = recv(size)
    send(size)
    send(msg)
    GIL release
    GIL release
    GIL release
    GIL release
    • Why? Each operation might block
    • Catch: Passes control back to CPU-bound thread

    View Slide

  62. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Performance Illustrated
    62
    CPU Bound
    Thread
    run
    Timer
    Thread
    10ms
    I/O
    Thread
    10ms 10ms 10ms
    Data
    Arrives
    recv recv send send done
    run run run run run
    10ms
    • Each message has 40ms response cycle
    • 1000 messages x 40ms = 40s (42.0s measured)

    View Slide

  63. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    63
    Despair

    View Slide

  64. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    A Solution?
    • Yes, yes, everyone hates threads
    • However, that's only because they're useful!
    • Threads are used for all sorts of things
    • Even if they're hidden behind the scenes
    64
    Don't use threads!

    View Slide

  65. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    A Better Solution
    • It's probably not going away (very difficult)
    • However, does it have to thrash wildly?
    • Question: Can you do anything?
    65
    Make the GIL better

    View Slide

  66. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    GIL Efforts in Python 3
    • Python 3.2 has a new GIL implementation
    • It's imperfect--in fact, it has a lot of problems
    • However, people are experimenting with it
    66

    View Slide

  67. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Python 3 GIL
    • GIL acquisition now based on timeouts
    67
    Thread 1
    Thread 2 READY
    running
    wait(gil, TIMEOUT)
    release
    running
    IOWAIT
    data
    arrives
    wait(gil, TIMEOUT)
    5ms
    drop_request
    • Involves waiting on a condition variable

    View Slide

  68. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Problem: Convoying
    • CPU-bound threads significantly degrade I/O
    68
    Thread 1
    Thread 2 READY
    running
    run
    data
    arrives
    • This is the same problem as in Ruby
    • Just a shorter time delay (5ms)
    data
    arrives
    running
    READY
    run
    release
    running
    READY
    data
    arrives
    5ms 5ms 5ms

    View Slide

  69. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Problem: Convoying
    • You can directly observe the delays (messaging)
    69
    Python/Ruby (No threads)
    Python 3.2 (1 Thread)
    Ruby 1.9 (1 Thread)
    : 1.29s (no delays)
    : 20.1s (5ms delays)
    : 42.0s (10ms delays)
    • Still not great, but problem is understood

    View Slide

  70. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    70
    Promise

    View Slide

  71. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Priorities
    • Best promise : Priority scheduling
    • Earlier versions of Ruby had it
    • It works (OS-X, 4 cores)
    71
    Ruby 1.9 (1 Thread)
    Ruby 1.8.7 (1 Thread)
    Ruby 1.8.7 (1 Thread, lower priority)
    : 42.0s
    : 40.2s
    : 10.0s
    • Comment: Ruby-1.9 allows thread priorities to be
    set in pthreads, but it doesn't seem to have much
    (if any) effect

    View Slide

  72. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Priorities
    • Experimental Python-3.2 with priority scheduler
    • Also features immediate preemption
    • Messages (OS X, 4 Cores)
    72
    Python 3.2 (No threads)
    Python 3.2 (1 Thread)
    Python 3.2+priorities (1 Thread)
    : 1.29s
    : 20.2s
    : 1.21s (faster?)
    • That's a lot more promising!

    View Slide

  73. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    New Problems
    • Priorities bring new challenges
    • Starvation
    • Priority inversion
    • Implementation complexity
    • Do you have to write a full OS scheduler?
    • Hopefully not, but it's an open question
    73

    View Slide

  74. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Final Words
    • Implementing a GIL is a lot trickier than it looks
    • Even work with priorities has problems
    • Good example of how multicore is diabolical
    74

    View Slide

  75. Copyright (C) 2010, David Beazley, http://www.dabeaz.com
    Thanks for Listening!
    • I hope you learned at least one new thing
    • I'm always interested in feedback
    • Follow me on Twitter (@dabeaz)
    75

    View Slide