Go @ Skimlinks

Go @ Skimlinks

A quick run through how we came to use Go at Skimlinks, the success we've had with it challenges we faced.

Fb864468dd766543fe974457400df118?s=128

Richard Johnson

May 31, 2013
Tweet

Transcript

  1. 3.

    Skimlinks • I joined almost 2 years ago • We

    had poor server utilisation! ◦ Memory constrained PHP under apache prefork • Wanted to move onto something else ◦ Cue language war ◦ Cue Bake-off: http://blog.skimlinks.com/2012/03/15/api-programming-language-bake-off/
  2. 4.

    • Go sucked. ◦ Why? ▪ Early revision? ▪ I

    coded it wrong? • Scala rocked ◦ Why? ▪ JVM? ▪ I probably coded it wrong • We chose Python & Tornado. • ... and I never looked at Go again. Results....
  3. 5.

    Enter Skimwords! • Acquired NLP tech from a startup in

    the US • Mostly written in C (!) • During review I noticed a potential algorithm improvement! ◦ Mocked it up in my own time using Go ▪ Needed to be fast & low-level (replacing C) ▪ No concurrency requirements (unlike APIs) ◦ Results were so compelling I presented them to management • We halved our Skimwords server costs.
  4. 6.

    But still challenges... • The C code was complex, brittle

    and difficult • Business requirements: ◦ Better phrase generation with different "engines" ▪ Hired a new NLP & AI wizard (Ian) ◦ Formalise assessment of phrases & matching • Everything was pointing to us needing to split up our system into smaller components
  5. 7.

    ... so we did • Re-implemented logical components in Go

    • ZMQ over IPC as communication layer ◦ Decent Go bindings (for ZMQ == 2.2) • JSON & go types as protocol ◦ Lovely native marshalling • Still needed some external C libraries ◦ C? Go? CGo... well... go! ◦ Snowball stemmer ◦ Judy Arrays
  6. 8.

    also ... • We run everything under supervisord ◦ Just

    really convenient • RPM packaged binaries ◦ Single, portable binaries FTW! • Puppet based deploys • EC2 cloud infrastructure • It just works!
  7. 10.

    A couple of months later... • 30% speed increase •

    Fixed a lot of difficult C heisenbugs • Better precision • QA gave it the all clear • Turned it on for 10% of traffic. ◦ Disaster!!!! ◦ After about 3 mins, stack would freeze for about 3 seconds, causing requests to timeout and upstream services to become overloaded.
  8. 11.

    WTF??!!!!????* • Our product index contained about 50 million objects

    & 14GB RAM • Go 1.0 garbage collection ◦ Mark & Sweep all objects ◦ STOPS THE WORLD! • Moved objects into C with CGo ◦ No more sweeps ◦ Reduced memory usage by about 3GB • Stopped using runtime.SetFinalizer ◦ Freeing memory as we went made things less jerky
  9. 12.

    Disaster! (part 2) • We increased our index size by

    a third. • Started getting errors: "out of memory" • Oh, Go programs can't be > 16GB ◦ Oh, right, fine.... hang on, wat??? • Started running patched Go in production! ◦ Really, really, really bad idea. ◦ Got away with it (most memory was unmanaged) • Thankfully, 1.1 was in beta, no memory limit! ◦ Started running that, which rapidly evolved to 1.1 ◦ Faster as well!
  10. 13.

    Disaster! (part 3) • Code stopped compiling!! • Worked fine

    locally, but not on Jenkins! • Libraries changed under our feet! ◦ "go get" is cute and all, but: ◦ Go builds libraries once & keeps locally! • Easy fixes (thankfully) • Fork your libraries!!
  11. 14.

    The story at present... • We've been using Go in

    production for almost a year (alongside C stack) • Complete Go (with some python) stack live for 3 months. • IMHO is Go a production ready language? ◦ Yes! Well... Mostly ▪ GC still warrants some improvement ▪ Most libraries are "hobby" projects, expect to pick up some maintenance effort
  12. 15.

    What's next for us? • Error tracking and reporting ◦

    Log4go -> Raven Go -> Sentry ▪ Still some issues to resolve with high volumes ◦ http://riemann.io/ for stats! ▪ Seems to work well, live over UDP on all boxes • More improvements! ◦ Incorporation of user intent values ◦ Pluggable & A/B tested product relevance algorithms