An introduction to reactive programming, approaches to understand dataflow-programming, some historical notes about Rx.NET and RxJava from my perspective and some future directions I see.
at the Budapest University of Technology and Economics. • Picked up by MTA SZTAKI almost immediately o where I converted to computer science and o started my PhD studies and research into transparency regarding manufacturing IT systems. • Still working there to this day o 2/3 as an end-to-end software/developer guru and o 1/3 as a researcher on topics concerning Cyber-Physical Production Systems and Industry 4.0 (essentially transparency regarding manufacturing IT systems). • If all goes well, I’ll submit my dissertation Q4 this year o which only contains about 15% reactive programming-related novelties.
paradigm around data flows • awaiting data/events without blocking a thread… • … while user-defined functions are invoked. • Observer- and Iterator-pattern on steroids • Mostly declarative such as SQL • Everyday examples o Excel recalculating cells based on changes in source/dependent cells o Attaching event handlers to GUI elements such as buttons and textboxes o Compilers tracking the dataflow network for optimization, processors streamlining computation on data
else’s machine) is upon us • Costs money, we’d better run efficiently: o Blocked threads don’t do computation and business logic, just keeps up OS resources. o High latency, low throughput users get bored with waiting o If something errors out, what do we do? o Time-to-market; do developers need to battle concurrency instead of business logic? • Cyber-Physical Production Systems: industry caught up with IT o Autonomous, loosely coupled Internet-of-Things ubiquitous computing… o Okay, but with what tools? o Thinking about it, network communication is already reactive (see TCP) o Time to get the rest of the system(s) reactive!
List<User> recentlySeen = new ArrayList<>(); for (User u : iter) { if (u.lastLogin.compareTo(DateMidnight.now().minusDays(1))) { recentlySeen.add(u); } } List<UserFavorites> userFavs = new ArrayList<>(); for (User u : recentlySeen.subList(0, 5)) { userFavs.add(getUserFavorites(u.id)); } listView.setInput(userFavs); How do we make this reactive?
combine them, filter them, group them, limit them, etc.: SELECT u.name, COUNT(DISTINCT f.category) FROM Users u, UserFavorites f WHERE u.id = f.userId AND u.age >= 20 GROUP BY u.name LIMIT 10
how it happens, • doesn’t matter if tables load in parallel or • doesn’t matter if they are on different servers. • The server can cache and optimize queries as it sees fit, • we are only interested in receiving 10 records in sequence. • Today, most database drivers can accomplish this via blocking API. o There’s hope: some NoSQL database has asynchronous/reactive drivers already.
// filter for users with age >= 20 .where(u -> u.age >= 20) // join with the favorites table .join(u -> from(getUserFavorites()) // where the user is the same .where(f -> u.id == f.userId) // get just the category .select(f -> f.category), // build a pair of user and its favorites (u, fs) -> [u, fs]) // we only need the user name and the count of unique favorite cat. .select([u, fs] -> [u.name, fs.distinct().count()]) // maximum 10 records .limit(10) // do something with them .forEach(System.out::println)
concept, with the help of MONADs, arrows and flips… • …, but it’s quite easy to get confused and lost, • however, some intuition and mechanical thinking could help… • … because you don’t usually derive your query from relational algebraic theorems. • Let’s look at the well known Iterable/Iterator interfaces (for .NET: IEnumerable/IEnumerator). interface Iterable<T> { Iterator<T> iterator(); } interface Iterator<T> { boolean hasNext(); T next(); void remove(); }
due to originating from implicit use: 1. Handling exceptions: hasNext() and next() may throw an unchecked exception, but when was the last time you wrapped a for-each loop in try-catch just for this? 2. Processing an Iterator can be terminated at any time and the GC will free it eventually, but if it holds onto resource, how can a simple break instruction notify the Iterator that it won’t be used any further and can release its resources now? 3. The next() blocks and returns only when there is data available, but why can’t I ask the number of items available that can be read without blocking just like with InputStream.available()? 4. The co- and contra-variance intent can’t be specified in the interface declaration. Given a mixed Integer/Double sequence, why can’t I process it as a Number sequence. The first three options can be modelled in the interface itself, but for the fourth one, we’ll get it through dualization…
the naming still differs… • but we can apply some refinements and rationalizations: interface ISubscriber<T> { void isCompleted(boolean b); void error(Exception e); void next(T value); void remove(); void controls(IControls s); } interface Subscriber<T> { void onComplete(); void onError(Throwable e); void onNext(T value); void onSubscribe(Subscription s); } • The dataflow ends only once; it doesn’t make sense to report it being not- ended-yet. • There is no sense in removing elements: we can simply ignore them.
representing the single result of a future computation. • But when does this result become available? o Do we check isDone() repeatedly? wasteful busy-loop Future<String> f = executor.submit(() -> calculateString()); while (!f.isDone()); System.out.println(f.get()); // + try-catch o Calling get() and block… wasting an entire thread just for blocking Future<String> f = executor.submit(() -> calculateString()); f.get(); // + try-catch • Most programming languages have better alternatives: o Promise, CompletableFuture, ListenableFuture, etc. o You can register a callback which gets called asynchronously once the result happens.
here, CompletableFuture<String> f = ... f.thenApply(System.out::println); • but what if we need a list of values, CompletableFuture<List<String>> f = ... f.thenApply(list -> list.stream().forEach(System.out::println)); • which are futures themselves? CompletableFuture<List<CompletableFuture<String>>> f = ... f.thenApply(list -> list.stream().forEach(g -> g.thenApply(System.out::println))); • This is unsustainable: the „waiting” graph gets complicated and incurs overhead. • Not to mention, many „async” APIs feature Future-only access.
server are already asynchronous in respect to each other. • XMLHttpRequest, AsyncCallback, etc., depending on the framework: api.getUsers(new AsyncCallback<List<User>>() { public void onSuccess(List<User> list) { for (User u : list) { api.getFavorites(u.id, new AsyncCallback<List<UserFavorites>>() { public void onSuccess(List<UserFavorites> list) { api.getSuggestions(u.id, list.size(), new AsyncCallback<...>() { public void onSuccess(...) { } public void onFailure(...) { } } } public void onFailure(Exception ex) { Window.alert(ex.toString()); } }); } } public void onFailure(Exception ex) { Window.alert(ex.toString()); } }); Spaghetti code:
Livin’ in a service-oriented world: the data is on different servers with replication and fault-tolerance.… • Trivial solution: one API call collecting everything o aka, „the Netflix method” • But we’d just shifted the problem over to the server- side and • for the sake of user experience, multiple calls may be still required. http://techblog.netflix.com/2013/01/optimizing-netflix-api.html
the callback-hell, one thing most async frameworks forgot: • what if the user wants to cancel an activity? • Most of these API’s return void and there is no Future.cancel() either, • but even if there, how can the developer stop all those async calls • which may run at any time and in any number?
and Petri-nets o but requires extra learning and comprehension, which takes time… • National Instruments LabView programming environment: http://91-527- f2009.wiki.uml.edu/LabVIEW+Review • Blocks, • wires, • groups, • but it runs almost synchronously internally and gets compiled into an imperative program. • Designed for continuous operations.
+ visual editor: Fully asynchronous engine - execution, - debugging, - administration and - cooperation. All this transparently over the network. http://www.advance-logistics.eu
about my dissertation: Thesis #5: structure-based type-system and its relational algorithms, Thesis #6: pluggable type-system based, parametric and variable-typed type-inference algorithm and Thesis #7: the first Java-based reactive and parametric- asynchronous block-oriented execution environment*. * with tons of algorithms in and out…
Constants vs. async data clash (when do we need to re-emit?), o connecting parametric types with arity-mismatch (Map<K, V> = Collection<Pair<K, V>>) and o limited error handling (need to observe all block’s error output and restart the realm). • Why leave in this state? o The project (funding) ended, o no real use cases and no real users and o a project with a higher prestige came in… • All is not lost: o The reactive subsystem lived on, o the experience, pitfalls, decision points and design concept stayed and o so I could, with high confidence, fix another, ever increasing in popularity, project’s mistakes…
Microsoft’s Channel 9 developer video portal. • Project ADVANCE wins • First meetings in ADVANCE, buzzwords get written down, I start porting Rx.NET • First public version, but nobody cares for the reactive paradigm for a year. June 2009 Feb 2010 Nov 2010 Feb 2011
Reactive4Java • The ADVANCE EU FP7 project started at the right time when Microsoft was working on something it needed… o but there was no source code available, o the documentation was virtually non-existent, o plus, the only way to grasp the concepts and features were only a dozen Channel 9 video casts. o In addition, it was written in C# and we work in industries where Java is a better fit due to higher standards and better platform independence. • Therefore, in concert with the aim of making ADVANCE open- source from the start, I’ve started re-implementing Rx.NET as Reactive4Java.
within 2 months, the library reached 80% of the functionality of the original. o The second 80% was added over a period of 1.5 years. However, shadows were lurking nearby • Lambdas and Java 8 were only a dream back then; o everything had to be inner classes, which is tedious and boring. • Extension methods in C# made it super-easy to build dataflows fluently, o there was nothing like it in Java, nor will ever be IMO. o What’s left was using static methods on utility classes and compose them outwards: timeout(concat(merge(map(…)), source2), 5, TimeUnit.SECONDS) o only about a year after came to the conclusion the library needs a builder-pattern that allows fluent query-building.
library, but despite its visibility, the Java community’s responses were grim: • „Bleh, invented by Microsoft…” • „Is this a lisp/Haskell/functional programming thing?” • „I don’t want to write inner classes.” Instead, most stayed with conventional and limited technologies: • Thread pools and Futures, • Scala, Groovy • agent-based messaging systems (i.e. Akka). Without real feedback, practically I was developing it for my self. But then suddenly…
Microsoft’s Channel 9 developer video portal. • Project ADVANCE wins • First meetings in ADVANCE, buzzwords get written down, I start porting Rx.NET • First public version, but nobody cares for the reactive paradigm for a year. • An engineer from Netflix contacts me about reactive programming, but only a few emails were exchanged • Netflix goes public w/ a independent Rx port: RxJava. • RxJava ~40% feature complete, I join the project. June 2009 Feb 2010 Nov 2010 Feb 2011 Apr 2012 Feb 2013 Nov 2013
streaming video provided, https://speakerdeck.com/benjchristensen/functional-reactive-programming-in-the-netflix-api-qcon-london-2013 with 10M subscribers… … whose data amount wins over Torrent.
of IT engineers and managers o almost like Google, but Netflix people appear to be more friendly and open… • Freedom and responsibility o Trying out new paradigms and technologies „in work hours”. o Many other companies are not willing to fund such endeavors but only in 5 year periods when the personnel gets replaced… • Heterogeneous development environment o Engineers can chose from a diverse set of programming languages and platforms (mainly JVM-based ones but JavaScript is also dominant). • No own servers, everything runs in Amazon AWS cloud o It isn’t cheap, but cheaper than owning a private cloud, but still room for cost-reduction. • Using open-source technologies, releasing open-source technologies o Many cloud-related software technology was open-sourced by Netflix: Hystrix, Turbine, Eureka, Servo, etc.
higher than in Hungary, o The average is 5 years (and interestingly, the technological renewal period as well). • An engineer, who was already using Rx moved from Microsoft over to Netflix o bringing the knowledge and showing the benefits, an internal education series started, o taking them months to „grok” it • or even years… • which seems rather odd, perhaps the topic was approached from the wrong angle (monads ahem)… • and the great inventor, Erik Meijer himself, was available too… o as an independent consultant • so they didn’t have to reverse- and forward-engineer the whole thing like I had to. • they considered forking Reactive4Java at one point, but decided to start from scratch o The requirements were more concrete, the focus was different and broader.
the big announcement came: They started porting Rx.NET over by the name RxJava. • An ambitious plan: o Complete functional parity, minus the idiomatic language differences o Polyglot environment: it had to run on any JVM language, in desktop- and server- environments and even on Android. o Forward looking design: Java 8 was coming and it had integrate with it nicely, however, Android was still stuck in Java 6/7 API level… • However, I wasn’t aware it until that year’s September o It was a surprise question about what’s the difference between Reactive4Java and Netflix’ RxJava. o I always thought Google will be the one to „subvert” my library by doing something like it in Guava, o because they already began to work on a fluent Iterable API just like Ix.NET and my IterableBuilder inside Reactive4Java. o Btw, Interactive Extensions for Java: com.github.akarnokd:ixjava https://github.com/akarnokd/ixjava , check it out!
9 developer video portal. • Project ADVANCE wins • First meetings in ADVANCE, buzzwords get written down, I start porting Rx.NET • First public version, but nobody cares for the reactive paradigm for a year. • An engineer from Netflix contacts me about reactive programming, but only a few emails were exchanged • Netflix goes public w/ a independent Rx port: RxJava • RxJava ~40% feature complete, I join the project • I became a collaborator with extended rights… June 2009 Feb 2010 Nov 2010 Feb 2011 Apr 2012 Feb 2013 Nov 2013 Jan 2015
The challenge of writing efficient concurrent code. • The feeling I made something the right way. • Confirmation that many of my original thoughts about reactive programming holds out. • Set of industrial use-cases and applications I can back my PhD dissertation with. • Excellent key enabling technology for the upcoming Cyber- Physical Systems world o which hopefully won’t share the fate of Web 3.0.
history behind an see some concrete names... • Luckily, the reactive programming paradigm gains fame by every day and • and is accessible from almost all mainstream programming languages (through libraries) • But unfortunately, they aren’t on the same „tech-level” • I distinguish 5 tech levels of reactive programming: L0 Classic Observer-pattern (j.u.Observable) L1 Composable Observable/Observer L2 Transparently (a)synchronous (i.e., sync unsubscribable) L2.5 Lifting and composing L3 Adaptively push-pull (backpressure) L4 Self-optimizing and semi-auto-parallelizing
with language adapters (Rx~Groovy, ~Android, ~Clojure, ~Scala, ~Kotlin, ~Rust, stb.), • helper libraries (RxJava~Strings, ~Joins, ~Async, ~Math, ~Swing, ~Guava, RxNetty, stb.), • and wrappers (RxJavaReactiveStreams, Quasar, RoboVM) • Tech-levels: L1, L2, L2.5, L3 • 1500+ JUnit test • Very active user-base • But unfortunately, very small developer base (< 10)
be found o in various blogs, o in RxJava wiki, o on the ReactiveX website, o in GitHub issues, o on StackOverflow in the [rx-java] category, o in the JavaDoc and o in few dozen youtube and vimeo videos. • Most information is for beginner- and intermediate-developers o There isn’t any comprehensible single-location information source yet. • Immediate- and master-level knowledge sources completely missing o except my blog: https://akarnokd.blogspot.com: Advanced RxJava • Many invites to write a book about it: Manning, O’Reilly*, Packt** * indirectly ** common practice to turn reviewers into authors eventually. • I simply don’t have the time for it. A quality book takes a year to write and iterate.
Observable.create(subscriber -> { subscriber.onNext(„Hello World!”); subscriber.onCompleted(); }).subscribe(System.out::println); also commonly shown: … which is incorrect and writing operators is intermediate level+, instead*: Observable.create(subscriber -> { SingleProducer<String> sp = new SingleProducer<>(„Hello World!”); subscriber.setProducer(sp); }).subscribe(System.out::println); * Only available in nightly RxJava or starting from v1.0.12
doing the reactive-way, • for the rest, there is RxJava.” • Library, not a framework: o it helps us but doesn’t limit us. o frameworks tend to box you in, escaping the box to do something custom can be quite difficult… • Think through the problem in a sequential manner at first, then convert it to reactive o Most business logic is just plain old imperative Java code, o Work in batches and for better locality and Fork-Join parallelism (buffer() + list.stream().parallel()) • It’s almost certainly flatMap()! • It takes a little experience recognize the conversion options, don’t be shy and ask us (me) a question on GitHub, Google groups and StackOverflow.
frame for the whole reactive paradigm o Message driven, elastic, resilient and responsive systems • Reactive-Streams (http://www.reactive-streams.org) o The best base reactive API that’s out there (Tech-level L2+L3), o but more aimed at library writers than typical consumers/developers. o Common language and semantics for supporting interoperability among solutions o Java 9+ something like this too, but unnecessarily in my opinion…
+ algorithm • I wish I’d came up with the whole base concept… o somehow, I have to do something else whenever great discoveries happen… • No modeling happened that I know of, just straight implementations • I wrote around 75 distinct algorithms for the operators and components o which would be great if this were a recognized science area, at least among mechanical engineers in Hungary. • Therefore, my dissertation will contain small portions of the accomplishments and more about how I built upon it in ADVANCE o before the thing was even invented… • However, there is still potential in working out the tech-level 4 features…
fusion (concept) o Instead of monolithic operators, build them from smaller standard elements (model) o so they can be rearranged, optimized, unified; maybe even through code-generation (algorithm) 2. Semi-automatic parallelization (concept) o Know the operator internals and analyze f(x) through bytecode (model) o It’s either a matter of operator organization or code-generation + case (1) (algorithm) f(x) Q Q Q Q C Q f(x) Q Q Q Q C f(x) f(x) C Q Q f(x) C Q f(x) C Q f(x) C Q
currently. • Inherently asynchronous and distributed systems have to work together. • Traditional blocking, callback- and thread pool-based approaches don’t scale well and don’t compose (at all). • With Reactive Programming, i.e., with enhanced declarative pub- sub method, one can shift the problem into a new viewpoint and domain where one can deal with asynchronous data flows more easily. • The ReactiveX family is there to help developers accomplish this on many mainstream programming languages through libraries and frameworks.