$30 off During Our Annual Pro Sale. View Details »

Clojure at Netflix

Dave Ray
October 02, 2013

Clojure at Netflix

A talk for Craftsman Guild on my team's use of Clojure at Netflix. Describes good, bad, and ugly lessons learned from going from a pure-Java codebase to Clojure in production.

Dave Ray

October 02, 2013
Tweet

Other Decks in Programming

Transcript

  1. A Little Clojure at Netflix
    Dave Ray
    @darevay
    Software Engineer, Netflix
    October 2013

    View Slide

  2. Agenda
    ● Netflix Culture
    ● Why Clojure?
    ● Our Path
    ● The Bad
    ● The Ugly
    ● The Good

    View Slide

  3. Freedom and
    Responsibility ...
    “We hire smart people, give
    them hard problems and get out
    of their way. We strive to
    increase the freedom of our
    employees as we grow, enabling
    them to move quickly as the
    industry evolves. With that
    freedom comes increased
    responsibility. High performers
    thrive in that environment and
    make great choices for Netflix.”
    http://jobs.netflix.com/who-we-are.html

    View Slide

  4. … Best Tool For The Job

    View Slide

  5. Our Team
    ● Netflix Social Infrastructure
    ○ Social data storage/analysis, stateless web services
    ○ Support Social APIs used by many Netflix
    UIs/devices
    ● 3 engineers
    ● 1 supportive manager
    ● 1 medium-size-ish existing Java codebase

    View Slide

  6. (about :clojure)
    ● Modern LISP
    ● JVM
    ● Functional
    ● Dynamic
    ● Opinionated
    You’re doing it wrong

    View Slide

  7. Data
    1, 3.14, 1/2 ; Numbers
    "A string"
    this ; A symbol, used to name things
    :a-keyword ; Used for enumerations and map keys
    {:key1 "value1" :key2 "value2"} ; HashMap
    [:a :vector 1 2 3] ; Like java.util.ArrayList
    '(this is :a :list 1 2 3)
    ● All data objects are immutable by default
    ● Data is composable.

    View Slide

  8. Naming things
    ; def names a global thing
    (def pi 3.14)
    ; Use let to bind values to local
    ; names
    (let [r 3.0
    c (* 2.0 Math/PI r)
    a (* Math/PI (* r r))]
    {:radius r
    :circumference c
    :area a})
    ;=> {:radius 3.0,
    :circumference 18.84955592153876,
    :area 28.274333882308138}

    View Slide

  9. Functions
    ; Use fn to create an
    ; anonymous function
    (fn [x] (* x x))
    ; Use defn to define a named function
    (defn square [x] (* x x))
    ; Call a function
    (square 3)
    ;=> 9
    ; Pass a function as an argument
    (map square [1 2 3])
    ;=> [1 4 9]

    View Slide

  10. Macros
    ; A macro can control evaluation,
    ; e.g. short-circuiting and expression
    (and (even? a) (odd? b))
    -> (if (even? a)
    (if (odd? b)
    true
    false)
    false)
    ● Clojure code is data
    ● A macro is a function that takes
    code and returns new code
    ● Invoked by the compiler
    ; A macro can also reduce boilerplate
    (rx/fn [a] (* 2 a)) -> (reify rx.Func1
    (call [this a]
    (* 2 a)))

    View Slide

  11. Factors
    ● Opinionated
    ● Abstraction
    ● Data transformation
    ● Interactive development
    ● Java interop
    Non-factors
    ● Concurrency, STM,
    agents, etc.
    Choosing Clojure

    View Slide

  12. Our Path to Clojure
    ● > 1 year of toe dipping
    ● External Tools
    ● Diagnostic services
    ● Greenfield production service
    ● Real production code

    View Slide

  13. Joyspring - “Netflix REPL”
    More sane/powerful than sh+curl
    (def s (subscriber/subscriber+ 12345))
    ;=> { ... user data from subscriber service ...}
    ; Check if they're in A/B test 4567
    (ab/allocation+ s 4567)
    ;=> nil
    ; Forcibly allocate them to test 4567 cell 2
    (ab/allocate+ s 4567 2)
    ; Get social info
    (social/profile+ s :netflix)
    ;=> { ... social connection status ...}
    Separate tool, free from
    constraints of Netflix platform

    View Slide

  14. Diagnostic Services
    ● Non-critical, "WTF is going on?!?" services.
    ● One-off maintenance jobs like database
    migrations
    ● Dip toes into Netflix platform
    from Clojure
    ● Dip toes into Netflix build
    infrastructure (Ant, Ivy,
    and Jenkins, oh my!)

    View Slide

  15. Non-Critical Greenfield Service
    ● What’s a greenfield?!
    ● Pure Clojure implementation of small, low
    risk service, in production
    ● Learn about: app structure, DI issues,
    testing, build, deploy

    View Slide

  16. Production - Bite the Bullet
    ● New production features in Clojure
    ● This is where it gets scary
    ● More people, more code

    View Slide

  17. Java, meet Clojure
    ● It’s easy!
    ● Add clojure.jar
    ● Put .clj files on classpath
    (like in src/main/resources)
    ● Done
    ● Yeah, but...

    View Slide

  18. Java, meet Clojure
    What to keep, what to rewrite?
    ● Take it easy
    ● No need to throw out working code
    ● Clojure has good Java interop
    ● When you need to write new Java, write
    Clojure instead
    ● Tastefully add abstractions as you go

    View Slide

  19. Java, meet Clojure
    ● Escape Java as fast as you can!
    ● Collect args, call a single entry point
    Be careful with caching. Breaks interactive model.
    ; Some Clojure code
    (ns com.netflix.mine)
    (defn func-to-call [x] (* 2 x))
    // Invoke it from Java
    final Var require = RT.var("clojure.core", "require");
    require.invoke(Symbol.intern("com.netflix.mine"));
    final Var funcToCall = RT.var("com.netflix.mine", "func-to-call");
    assertEquals(198L, funcToCall.invoke(99L));

    View Slide

  20. Java, meet Clojure
    … but
    ● Where do tests go?
    ○ JUnit?
    ○ clojure.test?
    ● Data structures?
    ○ Keep existing objects?
    ○ Map from Clojure maps and back?
    ● Classes? What about classes?
    ● Dependency Injection

    View Slide

  21. Where do tests go?
    ● We write tests in clojure.test
    ○ Still a lot of Java so we have some helper macros for
    Mockito
    ● Custom JUnit4 test runner that finds and
    runs Clojure tests
    ○ Run tests from Eclipse or wherever
    ○ Tests magically appear in jenkins
    ● There are many options here. Our approach
    is pragmatic, bowing to playing nicely within
    Netflix build infrastructure

    View Slide

  22. Data Structures?
    ● But I typed in all these dumb Pojos already!
    ● Again, we’ve been pragmatic
    ● For existing code, for the most part, stick
    with existing Java objects
    ● For new code, use plain maps and simple
    Clojure types
    ● Occasional conversion functions where “old
    meets new”

    View Slide

  23. Dependency Injection?
    ● AKA Passing parameters around
    ● IMHO DI doesn’t magically go away in a
    dynamic language
    ○ I still need to get the Cassandra Keyspace object to
    functions that use it
    ○ It’s still DI even if the dependency is a simple
    function instead of an object implementing and
    interface
    ● We currently take the “big context map”
    approach. Have other ideas we’d like to try

    View Slide

  24. The Bad

    View Slide

  25. (Lack of) type system
    EOM

    View Slide

  26. Refactoring
    ● One of the few things Java (tooling) is good
    at, but still isn’t perfect
    ● … but I have broken prod by moving a Java
    class ref'd by property files
    ● 1/10th the code that does 10x as much.
    Maybe it's not so bad?
    ● Still a pain point, especially if test coverage
    is low...

    View Slide

  27. Working with Others
    ● Clojure code is dense, especially someone
    else’s clojure code
    ● Given an undocumented function with
    arbitrary args, what does it accept/produce?
    ● Need more discipline about documentation,
    pre/post-conditions, schemas

    View Slide

  28. Existing Java Libraries
    ● Annotation fetish makes life difficult
    ● Singleton fetish makes life difficult

    View Slide

  29. http://perevodik.net/en/posts/39/

    View Slide

  30. java.lang.ClassCastException: java.lang.Long cannot be cast to clojure.lang.IFn
    at joyspring.main$eval2203.invoke(NO_SOURCE_FILE:1)
    at clojure.lang.Compiler.eval(Compiler.java:6619)
    at clojure.lang.Compiler.eval(Compiler.java:6582)
    at clojure.core$eval.invoke(core.clj:2852)
    at clojure.main$repl$read_eval_print__6588$fn__6591.invoke(main.clj:259)
    at clojure.main$repl$read_eval_print__6588.invoke(main.clj:259)
    at clojure.main$repl$fn__6597.invoke(main.clj:277)
    at clojure.main$repl.doInvoke(main.clj:277)
    at clojure.lang.RestFn.invoke(RestFn.java:1096)
    at clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__1610.invoke
    (interruptible_eval.clj:56)
    at clojure.lang.AFn.applyToHelper(AFn.java:159)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:617)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1788)
    at clojure.lang.RestFn.invoke(RestFn.java:425)
    at clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eva
    41)
    at clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__1651$fn__1
    invoke(interruptible_eval.clj:171)
    at clojure.core$comp$fn__4154.invoke(core.clj:2330)
    at clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__1644.invoke
    (interruptible_eval.clj:138)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:680)
    OMG, the Stacktraces!?!?!?!?

    View Slide

  31. View Slide

  32. The Ugly

    View Slide

  33. Mocking
    (defn viewing-history
    [user]
    ... returns a seq of viewing record maps ...
    [{:video 1234 :start 0 :end 60 :duration 60} ...])
    (def videos-watched
    [user]
    (->> user
    viewing-history
    (filter (fn [{:keys [start end duration]}]
    (> (/ (- end start) duration) .75))))
    (deftest test-videos-watched
    (let [mock-viewing-history [{:video ...} {:video ...} ...]
    (with-redefs [viewing-history (constantly mock-viewing-history)]
    ... call videos-watched and test some stuff ...)))
    ; ... now we make a change to viewing-history …
    (defn viewing-history
    [user]
    ... returns a seq of maps ...
    [{:video 1234 :percent-watched 0.5} ...])
    ; re-run videos-watched test. Success! WAT?!?
    Mocking "freezes" data and interactions. brrrr….

    View Slide

  34. Thawing Mocks
    clojure.core.typed
    (require '[clojure.core.typed :refer [ann]])
    (ann pi Double) ; annotate a constant
    (def pi "Pi, more or less" 3.14)
    ; annotate a function
    (ann area [Double -> Double])
    (defn area [r]
    (* pi (* r r)))
    https://github.com/clojure/core.typed
    A la carte static type checking for Clojure
    Powerful, but invasive

    View Slide

  35. Thawing Mocks
    Prismatic Schema
    (use 'clojure.core.contracts)
    (def area
    (with-constraints
    (fn [r] (* pi (* r r)))
    (contract area-contract
    [r] [number? => (number? %)])))
    (require [schema.core :as s])
    (s/defn area :- Double
    [r :- Double]
    (* pi (* r r)))
    clojure.core.contracts
    Structured, composable runtime assertions
    https://github.com/prismatic/schema https://github.com/clojure/core.contracts

    View Slide

  36. So, Integration Tests
    ● Exercise module boundaries
    ○ This is where stuff breaks
    ● Requires
    ○ Dedicated test/staging environment
    ○ Occasional diagnostic endpoints for setup
    ● Write them in Clojure!
    ○ clojure.test + Joyspring

    View Slide

  37. So, Integration Tests
    ● Slower?
    ○ (but interactive development)
    ● Brittle?
    ○ False negatives due to state
    ○ False negatives due to normal failures
    ● Occasional Hacks
    ○ Services tuned for production use. Differs from
    integration test patterns
    ○ Caches!

    View Slide

  38. Integration Test Failure Fail
    (deftest test-link-visitor-to-facebook-id-failure
    (testing "A failure while linking fb id to customer leaves user
    in not_connected state"
    (fixture/ensure-test-user-disconnected)
    (sabot/inject
    (at hystrix.linkVisitorToFacebookId
    eval
    (throw (RejectedExecutionException. "test-link-visitor-to-facebook-id-failure")))
    =>
    (try+
    (fixture/ensure-test-user-facebook-connected)
    (throw (Exception. "Request unexpectedly succeeded."))
    (catch (comp #{503} :status) e
    (let [s (subscriber/subscriber+ (:customer-id fixture/netflix-user))]
    ; make sure status is restored in subscriber and fb id is removed
    (is (= "not_connected" (get-in s [:social :connection-status])))))))))
    Sabot is a Clojure library for injecting specific, fine-grained
    failures into a request
    Magic here!

    View Slide

  39. The Good
    Not Pictured

    View Slide

  40. Interactive Workflow
    ● Write integration test
    ● Start server
    ● Connect REPL
    ● Edit, reload, test
    ○ You can unit test in here too!

    View Slide

  41. Web REPL
    http://blog.jayfields.com/2012/06/clojure-production-web-repl.html
    ● Explore instance state
    ● Quickly sanity check function behavior
    ● Easy back-of-envelope performance characteristics in
    real world conditions (region!)

    View Slide

  42. Abstraction
    ● Clojure is extremely expressive
    ● Build a language for your problem

    View Slide

  43. Abstraction
    ● Not about lines of code. About clearly
    expressing intent
    ● Take it easy
    ○ Programmers love to wrap, especially Java
    ● You'll get it wrong at least once
    ● Wrapping or abstracting is language
    design, i.e. hard

    View Slide

  44. Abstracting Hystrix
    ● https://github.com/Netflix/Hystrix
    ● Resilience via thread pools, circuit breakers, fallbacks
    // Define a command in Java
    public class GetUserCommand extends HystrixCommand {
    private final HttpClient client;
    private final long id;
    public GetUserCommand(HttpClient client, long id) {
    this.client = client;
    this.id = id;
    }
    @Override
    protected User run() {
    return client.get("/user/" + id, User.class);
    }
    @Override
    protected User getFallback() {
    return User.missing(id);
    }
    }
    // ... and use it ....
    new GetUserCommand(client, id).execute();

    View Slide

  45. Abstracting Hystrix
    (defn get-user
    [client id]
    (.get client id User))
    (require '[com.netflix.hystrix.core :as hystrix])
    (hystrix/defcommand get-user
    {:hystrix/fallback-fn (fn [client id] (User/missing id))}
    [client id]
    (.get client id User))
    ; ... and use it. OMG, just a fn call!!
    (get-user client 12345)
    This is what we’re doing,
    just defining a function
    So why does the Hystrix command have to look so
    different? No boilerplate here...

    View Slide

  46. Abstracting Pig
    ● http://pig.apache.org/
    ● Map-reduce platform/language
    ● Already a (bad) abstraction!
    REGISTER 'pp-example.py' USING jython AS func;
    users = LOAD 'users.tsv'
    as (user_id:long,
    image50x50:chararray,
    image145x145:chararray,
    image400x400:chararray);
    user_images = FOREACH users GENERATE
    TOBAG(image50x50, image145x145, image400x400) AS image_bag:bag{t:tuple(image:
    chararray)};
    user_images_flat = FOREACH user_images GENERATE
    FLATTEN(image_bag);
    domains = FOREACH user_images_flat GENERATE
    func.extractDomain(image_bag::image);
    distinct_images = DISTINCT user_images;
    store distinct_images into 'domains';
    import
    re
    @outputSchema('image:chararray')
    def
    extractDomain(image):
    match
    =
    re.search('(https?://.*?)/',
    image)
    return
    match.group(0);

    View Slide

  47. Abstracting Pig - Pigpen
    (require '[pigpen.pig :as pig]
    '[pigpen.exec :as exec])
    (defn users []
    (pig/load-clj "users.tsv"))
    (defn domains []
    (->> (users)
    (pig/mapcat (juxt :image50x50 :image145x145 :image400x400))
    (pig/map (fn [url]
    (if url (second (re-find #"(https?://.*?)/" url)))))
    (pig/distinct)))
    (defn domains-script [f]
    (->> (domains)
    (pig/store-pig "domains")
    (exec/write-script f)))
    ; What's this? A test?
    (deftest test-domains
    (with-redefs [users (constantly [ ... mock data ... ])]
    (is (= (exec/debug (domains))
    ["http://facebook.com" "https://facebook.com" nil]))]))
    ● Clojure is pretty awesome at data
    manipulation.
    ● Also a real language.
    ● Also, we already know it
    ● Use it!
    Compile this Clojure code
    to Pig!

    View Slide

  48. Functions over Macros
    Consider combining 3 RxJava Observables
    An Observable is an asynchronous sequence
    ; Raw RxJava
    (Observable/zip
    (reify
    rx.util.functions.Func3
    (call [this a b c]
    (+ a b c)))
    stream-1
    stream-2
    stream-3)
    https://github.com/Netflix/RxJava
    Initially, we use raw
    Java interop to
    implement the
    “Func3” interface.
    This is tedious.

    View Slide

  49. Functions over Macros
    Consider combining 3 RxJava Observables
    An Observable is an asynchronous sequence
    ; With a macro
    (Observable/zip
    (rx/fn [a b c] (+ a b c)))
    stream-1
    stream-2
    stream-3)
    https://github.com/Netflix/RxJava
    I know! Macros are great
    for eliminating
    boilerplate!
    Enter rx/fn macro

    View Slide

  50. Functions over Macros
    Consider combining 3 RxJava Observables
    An Observable is an asynchronous sequence
    ; With a function
    (Observable/zip
    (rx/fn* +)
    stream-1
    stream-2
    stream-3)
    https://github.com/Netflix/RxJava
    But a function can do
    better, allowing
    composition with existing
    Clojure functions

    View Slide

  51. Separate representation/behavior
    // This can be expressed more fluently/builder-y
    keyspace.put()
    .withRow(1234)
    .withColumn(“name”)
    .withValue(“dave”)
    .withTtl(90)
    .execute()
    // Consider a typical method call
    putColumn(keyspace, 1234, “name”, “dave”, 90);
    ● No representation of the operation
    ● Closer, but without a lot of work still isn’t
    manipulable, introspectable, reusable etc

    View Slide

  52. Separate representation/behavior
    ● Clojure supports “super”-builder pattern
    out of the box, aka DATA
    ; Define ops as map and execute
    (let [op {:keyspace ks
    :type :put
    :row 12345
    :column "name"
    :value "dave"
    :ttl 90}]
    (execute op))
    ; Helpers to make it “fluent”
    (-> (keyspace ks)
    put
    (row 12345)
    (column "name" "dave" 90)
    execute)
    ; Anyone can define new helpers
    ; Here’s a partial op
    (defn user-op [id]
    (-> (keyspace default-ks)
    (row id)))
    (-> (user-op 12345)
    put
    (column "name" "dave" 90)
    execute)

    View Slide

  53. Emulate Existing Idioms
    ● Abstraction is language design
    ● Abstractions that emulate existing idioms in
    the host language (Clojure) will
    ○ Look better
    ○ Be easier to understand without research
    ○ Play better with existing features
    ● In Hystrix, defcommand is structurally
    identical to defn
    ○ Easy to switch. Easy to understand.
    ● Pigpen (mostly) has semantics identical to
    Clojure data pipelines

    View Slide

  54. Emulate Existing Idioms - rx/let-o
    ● clojure.core/let makes wiring together data
    transforms easy
    ● Not so in Rx, especially for expressions with
    “forks”
    ● Enter rx/let-o to take care of the details
    (rx/let-o [?user (get-user-o 123)
    ?friends (rx/mapcat (fn [u]
    (map get-friends-o (:friends u)))
    ?user)
    ?ab (rx/mapcat get-ab ?user)]
    (rx/merge ?user ?friends ?ab))

    View Slide

  55. Conclusion
    ● Take your time
    ● Clojure can produce big gains in productivity
    and satisfaction
    ● Yes, it's scary
    ● Mental shift required
    ○ Clojure examples to Clojure “in the large”
    ○ People are still figuring this out

    View Slide