Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Reactive Programming

Introduction to Reactive Programming

An introduction to reactive programming, approaches to understand dataflow-programming, some historical notes about Rx.NET and RxJava from my perspective and some future directions I see.

Based on a talk I gave inhouse at my institute.

Dávid Karnok

June 08, 2015
Tweet

Other Decks in Programming

Transcript

  1. About me • Graduated as Mechanical Engineer in 2005 o

    at the Budapest University of Technology and Economics. • Picked up by MTA SZTAKI almost immediately o where I converted to computer science and o started my PhD studies and research into transparency regarding manufacturing IT systems. • Still working there to this day o 2/3 as an end-to-end software/developer guru and o 1/3 as a researcher on topics concerning Cyber-Physical Production Systems and Industry 4.0 (essentially transparency regarding manufacturing IT systems). • If all goes well, I’ll submit my dissertation Q4 this year o which only contains about 15% reactive programming-related novelties.
  2. Outline • Definition • Why we need it? • Think

    in event-streams • History, the forgotten Reactive4Java • Netflix: RxJava, „our” library • Motivational examples • Cheat-sheet (if this becomes a course…) • Reactive-Streams & manifesto • Conclusion, „scientific” outlook
  3. What is the (functional) reactive programming? • It’s a programming

    paradigm around data flows • awaiting data/events without blocking a thread… • … while user-defined functions are invoked. • Observer- and Iterator-pattern on steroids • Mostly declarative such as SQL • Everyday examples o Excel recalculating cells based on changes in source/dependent cells o Attaching event handlers to GUI elements such as buttons and textboxes o Compilers tracking the dataflow network for optimization, processors streamlining computation on data
  4. „Do we need this, why?” • The CloudTM (= somebody

    else’s machine) is upon us • Costs money, we’d better run efficiently: o Blocked threads don’t do computation and business logic, just keeps up OS resources. o High latency, low throughput  users get bored with waiting o If something errors out, what do we do? o Time-to-market; do developers need to battle concurrency instead of business logic? • Cyber-Physical Production Systems: industry caught up with IT o Autonomous, loosely coupled Internet-of-Things ubiquitous computing… o Okay, but with what tools? o Thinking about it, network communication is already reactive (see TCP) o Time to get the rest of the system(s) reactive!
  5. Classical dataflows Aka, the iterator pattern: Iterable<User> iter = getUsers();

    List<User> recentlySeen = new ArrayList<>(); for (User u : iter) { if (u.lastLogin.compareTo(DateMidnight.now().minusDays(1))) { recentlySeen.add(u); } } List<UserFavorites> userFavs = new ArrayList<>(); for (User u : recentlySeen.subList(0, 5)) { userFavs.add(getUserFavorites(u.id)); } listView.setInput(userFavs); How do we make this reactive?
  6. Analogies SQL SELECT name FROM users WHERE userId = 1

    Observable.from(users) .filter(u -> u.id == 1) .map(u -> u.name) .subscribe(System.out::println) Dualizing Iterator<T> Iterable.iterator() T Iterator.next() boolean Iterator.hasNext() Input Output void Observable.subscribe(Subscriber<T>) void Subscriber.onNext(T) void Subscriber.onCompleted(boolean) Future and Callback-hell* Observable.just(1) .subscribeOn(Schedulers.io()) .flatMap(v -> getUsers()) .flatMap(u -> getSettings(u)) .take(20) .retry(2) .observeOn(guiScheduler) .subscribe(us -> gui.add(us)) .unsubscribe(); Dataflow-viewpoint Observable.range(1, 100) .flatMap(r -> ...) .publish()... f(x) Q Q Q Q C Q
  7. SQL analogy • Classic case: we have multiple tables, we

    combine them, filter them, group them, limit them, etc.: SELECT u.name, COUNT(DISTINCT f.category) FROM Users u, UserFavorites f WHERE u.id = f.userId AND u.age >= 20 GROUP BY u.name LIMIT 10
  8. Declarative approach • Declaring what we want, • doesn’t matter

    how it happens, • doesn’t matter if tables load in parallel or • doesn’t matter if they are on different servers. • The server can cache and optimize queries as it sees fit, • we are only interested in receiving 10 records in sequence. • Today, most database drivers can accomplish this via blocking API. o There’s hope: some NoSQL database has asynchronous/reactive drivers already.
  9. Almost like writing SQL // take the users table from(getUsers())

    // filter for users with age >= 20 .where(u -> u.age >= 20) // join with the favorites table .join(u -> from(getUserFavorites()) // where the user is the same .where(f -> u.id == f.userId) // get just the category .select(f -> f.category), // build a pair of user and its favorites (u, fs) -> [u, fs]) // we only need the user name and the count of unique favorite cat. .select([u, fs] -> [u.name, fs.distinct().count()]) // maximum 10 records .limit(10) // do something with them .forEach(System.out::println)
  10. If asynchrony could be a parameter… getUsersAsync() .subscribeOn(Schedulers.io()) .filter(u ->

    u.age >= 20) .flatMap(u -> getUserFavoritesAsync() .subscribeOn(Schedulers.io()) .filter(f -> u.id == f.userId) .map(f -> f.category) .distinct() .count() .map(fc -> Arrays.asList(u.name, fc))) .take(10) .observeOn(uiThreadScheduler) .forEach(a -> listView.add(u.get(0) + ”:” + u.get(1));
  11. Dualizing • This is the scientificTM way of deriving the

    concept, with the help of MONADs, arrows and flips… • …, but it’s quite easy to get confused and lost, • however, some intuition and mechanical thinking could help… • … because you don’t usually derive your query from relational algebraic theorems. • Let’s look at the well known Iterable/Iterator interfaces (for .NET: IEnumerable/IEnumerator). interface Iterable<T> { Iterator<T> iterator(); } interface Iterator<T> { boolean hasNext(); T next(); void remove(); }
  12. Almost… There are 4 aspects the Iterator interface doesn’t cover

    due to originating from implicit use: 1. Handling exceptions: hasNext() and next() may throw an unchecked exception, but when was the last time you wrapped a for-each loop in try-catch just for this? 2. Processing an Iterator can be terminated at any time and the GC will free it eventually, but if it holds onto resource, how can a simple break instruction notify the Iterator that it won’t be used any further and can release its resources now? 3. The next() blocks and returns only when there is data available, but why can’t I ask the number of items available that can be read without blocking just like with InputStream.available()? 4. The co- and contra-variance intent can’t be specified in the interface declaration. Given a mixed Integer/Double sequence, why can’t I process it as a Number sequence. The first three options can be modelled in the interface itself, but for the fourth one, we’ll get it through dualization…
  13. Swapping inputs and outputs interface Iterable<T> { Iterator<T> iterator(); }

    interface Iterator<T> { boolean | Exception hasNext(); T | Exception next(); void remove(); Controls controls(); } interface IObservable<T> { void subscribe( ISubscriber<? super T> s); } interface ISubscriber<T> { void isCompleted(boolean b); void error(Exception e); void next(T value); void remove(); void controls(IControls s); } interface Controls { long available(); void cancel(); } interface IControls { void request(long n); void cancel(); }
  14. In practice… • Yep, this is the publisher/subscriber paradigm, although

    the naming still differs… • but we can apply some refinements and rationalizations: interface ISubscriber<T> { void isCompleted(boolean b); void error(Exception e); void next(T value); void remove(); void controls(IControls s); } interface Subscriber<T> { void onComplete(); void onError(Throwable e); void onNext(T value); void onSubscribe(Subscription s); } • The dataflow ends only once; it doesn’t make sense to report it being not- ended-yet. • There is no sense in removing elements: we can simply ignore them.
  15. … everybody calls it differently Elem Rx.NET RxJava 1.x Reactive-Streams

    Source IObservable (i) Observable (c) Publisher (i) Observer IObserver (i) Subscriber (c) Observer (i) Subscriber (i) completion onCompleted() onCompleted() onComplete() Resource mgr. IDisposable (i) Subscription (i) - Controller -* Subscription (i) + Producer (i) Subscription (i) Multicast ISubject (i) Subject (ac) Processor (i) Fluent API Observables w/ extension methods Observable (c) w/ instance methods - Parametric asynchrony IScheduler (i) Scheduler (ac) Worker (ac) -
  16. Future-world… • First day in concurrency school: Java Futures, •

    representing the single result of a future computation. • But when does this result become available? o Do we check isDone() repeatedly?  wasteful busy-loop Future<String> f = executor.submit(() -> calculateString()); while (!f.isDone()); System.out.println(f.get()); // + try-catch o Calling get() and block…  wasting an entire thread just for blocking Future<String> f = executor.submit(() -> calculateString()); f.get(); // + try-catch • Most programming languages have better alternatives: o Promise, CompletableFuture, ListenableFuture, etc. o You can register a callback which gets called asynchronously once the result happens.
  17. … with multiple values? • Asynchonity reached, we could stop

    here, CompletableFuture<String> f = ... f.thenApply(System.out::println); • but what if we need a list of values, CompletableFuture<List<String>> f = ... f.thenApply(list -> list.stream().forEach(System.out::println)); • which are futures themselves? CompletableFuture<List<CompletableFuture<String>>> f = ... f.thenApply(list -> list.stream().forEach(g -> g.thenApply(System.out::println))); • This is unsustainable: the „waiting” graph gets complicated and incurs overhead. • Not to mention, many „async” APIs feature Future-only access.
  18. Callback-world • Met in web-world; mostly because the client and

    server are already asynchronous in respect to each other. • XMLHttpRequest, AsyncCallback, etc., depending on the framework: api.getUsers(new AsyncCallback<List<User>>() { public void onSuccess(List<User> list) { for (User u : list) { api.getFavorites(u.id, new AsyncCallback<List<UserFavorites>>() { public void onSuccess(List<UserFavorites> list) { api.getSuggestions(u.id, list.size(), new AsyncCallback<...>() { public void onSuccess(...) { } public void onFailure(...) { } } } public void onFailure(Exception ex) { Window.alert(ex.toString()); } }); } } public void onFailure(Exception ex) { Window.alert(ex.toString()); } }); Spaghetti code:
  19. … better together? • Why is this a problem? •

    Livin’ in a service-oriented world: the data is on different servers with replication and fault-tolerance.… • Trivial solution: one API call collecting everything o aka, „the Netflix method” • But we’d just shifted the problem over to the server- side and • for the sake of user experience, multiple calls may be still required. http://techblog.netflix.com/2013/01/optimizing-netflix-api.html
  20. … I’ve changed my mind! • Even if one accepts

    the callback-hell, one thing most async frameworks forgot: • what if the user wants to cancel an activity? • Most of these API’s return void and there is no Future.cancel() either, • but even if there, how can the developer stop all those async calls • which may run at any time and in any number?
  21. Dataflow-viewpoint • It is possible to model these with dataflow-graphs

    and Petri-nets o but requires extra learning and comprehension, which takes time… • National Instruments LabView programming environment: http://91-527- f2009.wiki.uml.edu/LabVIEW+Review • Blocks, • wires, • groups, • but it runs almost synchronously internally and gets compiled into an imperative program. • Designed for continuous operations.
  22. We’ve tried this in ADVANCE… • named ADVANCE Flow Engine

    + visual editor: Fully asynchronous engine - execution, - debugging, - administration and - cooperation. All this transparently over the network. http://www.advance-logistics.eu
  23. ... the World’s first … • with scientific intensity, thinking

    about my dissertation: Thesis #5: structure-based type-system and its relational algorithms, Thesis #6: pluggable type-system based, parametric and variable-typed type-inference algorithm and Thesis #7: the first Java-based reactive and parametric- asynchronous block-oriented execution environment*. * with tons of algorithms in and out…
  24. …, but only the concept survived. • Unsolved problems: o

    Constants vs. async data clash (when do we need to re-emit?), o connecting parametric types with arity-mismatch (Map<K, V> = Collection<Pair<K, V>>) and o limited error handling (need to observe all block’s error output and restart the realm). • Why leave in this state? o The project (funding) ended, o no real use cases and no real users and o a project with a higher prestige came in… • All is not lost: o The reactive subsystem lived on, o the experience, pitfalls, decision points and design concept stayed and o so I could, with high confidence, fix another, ever increasing in popularity, project’s mistakes…
  25. Timeline: long-long time ago • First video about Rx.NET on

    Microsoft’s Channel 9 developer video portal. • Project ADVANCE wins • First meetings in ADVANCE, buzzwords get written down, I start porting Rx.NET • First public version, but nobody cares for the reactive paradigm for a year. June 2009 Feb 2010 Nov 2010 Feb 2011
  26. Monads, co-monads and category theory • It’s complicated, see the

    video: http://channel9.msdn.com/shows/Going+Deep/Expert-to-Expert-Brian-Beckman-and-Erik-Meijer-Inside-the-NET- Reactive-Framework-Rx/
  27. Once upon a time was ADVANCE, which gave birth to

    Reactive4Java • The ADVANCE EU FP7 project started at the right time when Microsoft was working on something it needed… o but there was no source code available, o the documentation was virtually non-existent, o plus, the only way to grasp the concepts and features were only a dozen Channel 9 video casts. o In addition, it was written in C# and we work in industries where Java is a better fit due to higher standards and better platform independence. • Therefore, in concert with the aim of making ADVANCE open- source from the start, I’ve started re-implementing Rx.NET as Reactive4Java.
  28. …, growing and growing… • In the beginning of 2011,

    within 2 months, the library reached 80% of the functionality of the original. o The second 80% was added over a period of 1.5 years. However, shadows were lurking nearby • Lambdas and Java 8 were only a dream back then; o everything had to be inner classes, which is tedious and boring. • Extension methods in C# made it super-easy to build dataflows fluently, o there was nothing like it in Java, nor will ever be IMO. o What’s left was using static methods on utility classes and compose them outwards: timeout(concat(merge(map(…)), source2), 5, TimeUnit.SECONDS) o only about a year after came to the conclusion the library needs a builder-pattern that allows fluent query-building.
  29. …, but then disbelieved. I didn’t advertise or market the

    library, but despite its visibility, the Java community’s responses were grim: • „Bleh, invented by Microsoft…” • „Is this a lisp/Haskell/functional programming thing?” • „I don’t want to write inner classes.” Instead, most stayed with conventional and limited technologies: • Thread pools and Futures, • Scala, Groovy • agent-based messaging systems (i.e. Akka). Without real feedback, practically I was developing it for my self. But then suddenly…
  30. Timeline: some time ago • First video about Rx.NET on

    Microsoft’s Channel 9 developer video portal. • Project ADVANCE wins • First meetings in ADVANCE, buzzwords get written down, I start porting Rx.NET • First public version, but nobody cares for the reactive paradigm for a year. • An engineer from Netflix contacts me about reactive programming, but only a few emails were exchanged • Netflix goes public w/ a independent Rx port: RxJava. • RxJava ~40% feature complete, I join the project. June 2009 Feb 2010 Nov 2010 Feb 2011 Apr 2012 Feb 2013 Nov 2013
  31. The Netflix story • Who are they? The west’s largest

    streaming video provided, https://speakerdeck.com/benjchristensen/functional-reactive-programming-in-the-netflix-api-qcon-london-2013 with 10M subscribers… … whose data amount wins over Torrent.
  32. A modern company • Looks like they have the best

    of IT engineers and managers o almost like Google, but Netflix people appear to be more friendly and open… • Freedom and responsibility o Trying out new paradigms and technologies „in work hours”. o Many other companies are not willing to fund such endeavors but only in 5 year periods when the personnel gets replaced… • Heterogeneous development environment o Engineers can chose from a diverse set of programming languages and platforms (mainly JVM-based ones but JavaScript is also dominant). • No own servers, everything runs in Amazon AWS cloud o It isn’t cheap, but cheaper than owning a private cloud, but still room for cost-reduction. • Using open-source technologies, releasing open-source technologies o Many cloud-related software technology was open-sourced by Netflix: Hystrix, Turbine, Eureka, Servo, etc.
  33. It began as… • Developer mobility in the US is

    higher than in Hungary, o The average is 5 years (and interestingly, the technological renewal period as well). • An engineer, who was already using Rx moved from Microsoft over to Netflix o bringing the knowledge and showing the benefits, an internal education series started, o taking them months to „grok” it • or even years… • which seems rather odd, perhaps the topic was approached from the wrong angle (monads ahem)… • and the great inventor, Erik Meijer himself, was available too… o as an independent consultant • so they didn’t have to reverse- and forward-engineer the whole thing like I had to. • they considered forking Reactive4Java at one point, but decided to start from scratch o The requirements were more concrete, the focus was different and broader.
  34. …, then onto fame and victory. • In February 2013,

    the big announcement came: They started porting Rx.NET over by the name RxJava. • An ambitious plan: o Complete functional parity, minus the idiomatic language differences o Polyglot environment: it had to run on any JVM language, in desktop- and server- environments and even on Android. o Forward looking design: Java 8 was coming and it had integrate with it nicely, however, Android was still stuck in Java 6/7 API level… • However, I wasn’t aware it until that year’s September o It was a surprise question about what’s the difference between Reactive4Java and Netflix’ RxJava. o I always thought Google will be the one to „subvert” my library by doing something like it in Guava, o because they already began to work on a fluent Iterable API just like Ix.NET and my IterableBuilder inside Reactive4Java. o Btw, Interactive Extensions for Java: com.github.akarnokd:ixjava https://github.com/akarnokd/ixjava , check it out!
  35. Timeline: today • First video about Rx.NET on Microsoft’s Channel

    9 developer video portal. • Project ADVANCE wins • First meetings in ADVANCE, buzzwords get written down, I start porting Rx.NET • First public version, but nobody cares for the reactive paradigm for a year. • An engineer from Netflix contacts me about reactive programming, but only a few emails were exchanged • Netflix goes public w/ a independent Rx port: RxJava • RxJava ~40% feature complete, I join the project • I became a collaborator with extended rights… June 2009 Feb 2010 Nov 2010 Feb 2011 Apr 2012 Feb 2013 Nov 2013 Jan 2015
  36. What’s in it for me? • fame, of course •

    The challenge of writing efficient concurrent code. • The feeling I made something the right way. • Confirmation that many of my original thoughts about reactive programming holds out. • Set of industrial use-cases and applications I can back my PhD dissertation with. • Excellent key enabling technology for the upcoming Cyber- Physical Systems world o which hopefully won’t share the fate of Web 3.0.
  37. Introduction to… • It’s time to leave the concepts and

    history behind an see some concrete names... • Luckily, the reactive programming paradigm gains fame by every day and • and is accessible from almost all mainstream programming languages (through libraries) • But unfortunately, they aren’t on the same „tech-level” • I distinguish 5 tech levels of reactive programming: L0 Classic Observer-pattern (j.u.Observable) L1 Composable Observable/Observer L2 Transparently (a)synchronous (i.e., sync unsubscribable) L2.5 Lifting and composing L3 Adaptively push-pull (backpressure) L4 Self-optimizing and semi-auto-parallelizing
  38. Tech-level L0 L1 L2 L2.5 L3 L4 Java Java 9+

    Reactive4Java RxJava 0.x RxJava 1.x RxJava 2.x reactive-streams-jvm Akka-streams RxJS RxCpp Rx.NET Reactor Tech-level comparison ? ? ? ? ? ? ? Supported Bogus Not sure ?
  39. RxJava 1.x • De-facto standard, fluent reactive API library •

    with language adapters (Rx~Groovy, ~Android, ~Clojure, ~Scala, ~Kotlin, ~Rust, stb.), • helper libraries (RxJava~Strings, ~Joins, ~Async, ~Math, ~Swing, ~Guava, RxNetty, stb.), • and wrappers (RxJavaReactiveStreams, Quasar, RoboVM) • Tech-levels: L1, L2, L2.5, L3 • 1500+ JUnit test • Very active user-base • But unfortunately, very small developer base (< 10)
  40. To read up-front Author… Insert title here… … Java …

    2008 - … Unnecessary. Maybe, but doesn’t contain everything we needed. Anything about Java, because it will be the language of your business logic.
  41. RxJava Observables, observers and reactive-style programming 2016 Books about RxJava

    Beginner level Maybe, feels incomplete. If someone writes it…
  42. How can I learn it? • Unfortunately, all knowledge can

    be found o in various blogs, o in RxJava wiki, o on the ReactiveX website, o in GitHub issues, o on StackOverflow in the [rx-java] category, o in the JavaDoc and o in few dozen youtube and vimeo videos. • Most information is for beginner- and intermediate-developers o There isn’t any comprehensible single-location information source yet. • Immediate- and master-level knowledge sources completely missing o except my blog: https://akarnokd.blogspot.com: Advanced RxJava • Many invites to write a book about it: Manning, O’Reilly*, Packt** * indirectly ** common practice to turn reviewers into authors eventually. • I simply don’t have the time for it. A quality book takes a year to write and iterate.
  43. RxJava Hello World! import rx.*; Observable.just(„Hello World!”).subscribe(System.out::println); The first example:

    Observable.create(subscriber -> { subscriber.onNext(„Hello World!”); subscriber.onCompleted(); }).subscribe(System.out::println); also commonly shown: … which is incorrect and writing operators is intermediate level+, instead*: Observable.create(subscriber -> { SingleProducer<String> sp = new SingleProducer<>(„Hello World!”); subscriber.setProducer(sp); }).subscribe(System.out::println); * Only available in nightly RxJava or starting from v1.0.12
  44. RxJava toolbox Operators in great numbers, worth a separate and

    complete course… 1. map 2. flatMap 3. zip 4. combineLatest 5. concatMap 6. from 7. observeOn 8. subscribeOn 9. defer 10. just 11. error 12. take 13. groupBy 14. concat 15. merge 16. mergeWith, concatWith 17. takeUntil 18. onBackpressureBuffer 19. onBackpressureDrop 20. skip 21. retry 22. retryWhen 23. buffer 24. scan 25. publish 26. cache 27. lift 28. PublishSubject 29. BehaviorSubject 30. create Page 1 of 15
  45. „But what if …” • „There are things not worth

    doing the reactive-way, • for the rest, there is RxJava.” • Library, not a framework: o it helps us but doesn’t limit us. o frameworks tend to box you in, escaping the box to do something custom can be quite difficult… • Think through the problem in a sequential manner at first, then convert it to reactive o Most business logic is just plain old imperative Java code, o Work in batches and for better locality and Fork-Join parallelism (buffer() + list.stream().parallel()) • It’s almost certainly flatMap()! • It takes a little experience recognize the conversion options, don’t be shy and ask us (me) a question on GitHub, Google groups and StackOverflow.
  46. Something like an industry-standard • Reactive Manifesto (http://www.reactivemanifesto.org) o Conceptional

    frame for the whole reactive paradigm o Message driven, elastic, resilient and responsive systems • Reactive-Streams (http://www.reactive-streams.org) o The best base reactive API that’s out there (Tech-level L2+L3), o but more aimed at library writers than typical consumers/developers. o Common language and semantics for supporting interoperability among solutions o Java 9+ something like this too, but unnecessarily in my opinion…
  47. „Where is the science?” • Science = concept + model

    + algorithm • I wish I’d came up with the whole base concept… o somehow, I have to do something else whenever great discoveries happen… • No modeling happened that I know of, just straight implementations • I wrote around 75 distinct algorithms for the operators and components o which would be great if this were a recognized science area, at least among mechanical engineers in Hungary. • Therefore, my dissertation will contain small portions of the accomplishments and more about how I built upon it in ADVANCE o before the thing was even invented…  • However, there is still potential in working out the tech-level 4 features…
  48. Tech-level 4 I see two R&D possibilities: 1. Operator micro-op

    fusion (concept) o Instead of monolithic operators, build them from smaller standard elements (model) o so they can be rearranged, optimized, unified; maybe even through code-generation (algorithm) 2. Semi-automatic parallelization (concept) o Know the operator internals and analyze f(x) through bytecode (model) o It’s either a matter of operator organization or code-generation + case (1) (algorithm) f(x) Q Q Q Q C Q f(x) Q Q Q Q C f(x) f(x) C Q Q f(x) C Q f(x) C Q f(x) C Q
  49. Summary • Cloud and Cyber-Physical Production Systems are the future

    currently. • Inherently asynchronous and distributed systems have to work together. • Traditional blocking, callback- and thread pool-based approaches don’t scale well and don’t compose (at all). • With Reactive Programming, i.e., with enhanced declarative pub- sub method, one can shift the problem into a new viewpoint and domain where one can deal with asynchronous data flows more easily. • The ReactiveX family is there to help developers accomplish this on many mainstream programming languages through libraries and frameworks.
  50. Thank you for your attention! Email: [email protected] [email protected] Twitter: @akarnokd

    GitHub: https://github.com/ReactiveX/RxJava https://github.com/akarnokd