Upgrade to Pro — share decks privately, control downloads, hide ads and more …

响应式编程与流式数据 从 RxJS 到 Flink

Yadong Xie
November 21, 2020

响应式编程与流式数据 从 RxJS 到 Flink

Yadong Xie

November 21, 2020
Tweet

More Decks by Yadong Xie

Other Decks in Programming

Transcript

  1. reactionFn 视图更新过程 Remote API Timer Web View Computer Remote Server

    User Event Web View View = reactionFn(UserEvent | Timer | Remote API)
  2. News APP Enable Auto Refresh change fetch setInterval 2 ✓

    clearInterval Refresh 1 click fetch 3
  3. News APP Enable Auto Refresh change fetch setInterval 2 ✓

    clearInterval Refresh 1 click fetch 3
  4. News APP Enable Auto Refresh change fetch setInterval 2 ✓

    clearInterval Refresh 1 click fetch 3 touchstart touchmove touchend fetch
  5. News APP reactionFn Enable Auto Refresh change fetch setInterval 2

    ✓ clearInterval Refresh 1 click fetch 3 touchstart touchmove touchend fetch
  6. News APP reactionFn Enable Auto Refresh change fetch setInterval 2

    ✓ clearInterval Refresh 1 click fetch 3 touchstart touchmove touchend fetch View
  7. MVVM Model View Model View View Model Presentation Logic Business

    Logic and Data Notifications Data binding Commands View Web Interface
  8. MVVM Model View Model View View Model Presentation Logic Business

    Logic and Data Notifications Data binding Commands View Web Interface Event reactionFn
  9. MVVM Model View Model View View Model Presentation Logic Business

    Logic and Data Notifications Data binding Commands View Web Interface Event reactionFn reactionFn
  10. MVVM Model View Model View View Model Presentation Logic Business

    Logic and Data Notifications Data binding Commands View Web Interface Event reactionFn reactionFn 1. 数据的赋值与收集过程很难精确跟踪 2. View Model 与 View 之间的逻辑复杂/复⽤困难
  11. Redux State = reducer(Action, initState) reactionFn 1. State 描述中间状态,⽽⾮过程 2.

    Action 与 Event 关系不对应 3. State 变化⽆法追踪来源 View = f(State) View Event Event Event
  12. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh interval interval ✓ click
  13. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh interval interval interval ✓ click
  14. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh refresh interval interval interval ✓ click
  15. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh refresh interval interval interval ✓ ✓ click
  16. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh refresh interval interval interval change false ✓ ✓ click
  17. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh refresh interval interval interval change false ✓ ✓ click
  18. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh refresh interval interval interval change false ✓ ✓ click Request fetch fetch fetch fetch fetch
  19. User Event Timer Refresh touchstart touchmove touchend change true refresh

    refresh refresh refresh refresh interval interval interval change false ✓ ✓ click Request fetch fetch fetch fetch fetch View
  20. 定义源数据流 1 touchstart$ = fromEvent<TouchEvent>(document, 'touchstart') ; touchend$ = fromEvent<TouchEvent>(document,

    'touchend') ; touchmove$ = fromEvent<TouchEvent>(document, 'touchmove'); click$ = fromEvent<MouseEvent>(document.querySelector('button'), 'click'); Touch Stream Click Stream Change change$ = fromEvent(document.querySelector('input'), 'change'); User Event Stream User Event Interval Remote API
  21. 定义源数据流 1 touchstart$ = fromEvent<TouchEvent>(document, 'touchstart') ; touchend$ = fromEvent<TouchEvent>(document,

    'touchend') ; touchmove$ = fromEvent<TouchEvent>(document, 'touchmove'); click$ = fromEvent<MouseEvent>(document.querySelector('button'), 'click'); Touch Stream Click Stream Change change$ = fromEvent(document.querySelector('input'), 'change'); User Event Stream interval$ = interval(5000); Interval Stream Timer Stream User Event Interval Remote API
  22. 定义源数据流 1 touchstart$ = fromEvent<TouchEvent>(document, 'touchstart') ; touchend$ = fromEvent<TouchEvent>(document,

    'touchend') ; touchmove$ = fromEvent<TouchEvent>(document, 'touchmove'); click$ = fromEvent<MouseEvent>(document.querySelector('button'), 'click'); Touch Stream Click Stream Change change$ = fromEvent(document.querySelector('input'), 'change'); User Event Stream interval$ = interval(5000); Interval Stream Timer Stream fetch$ = fromFetch('https://randomapi.azurewebsites.net/api/users'); Fetch Stream Remote API Stream User Event Interval Remote API
  23. 转换/创建中间数据流:⾃动刷新 2 autoRefresh$ refresh refresh autoRefresh$ = change$.pipe ( switchMap(enabled

    => (enabled ? interval$ : EMPTY) ) ); change$ change true change false ✓ ✓ interval$ interval interval
  24. 转换/创建中间数据流:下拉刷新 2 pullRefresh$ refresh pullRefresh$ = touchstart$.pipe ( switchMap(touchStartEvent =

    > touchmove$.pipe ( map(touchMoveEvent => touchMoveEvent.touches[0].pageY - touchStartEvent.touches[0].pageY) , takeUntil(touchend$ ) ) ) , filter(position => position >= 300) , take(1) , repeat( ) ); touchPositionHandler touchstart touchstart$ touchstart touchend touchend$ touchmove touchmove touchmove$ touchmove
  25. 转换/创建数据流:合并 2 refresh$ = merge(clickRefresh$, autoRefresh$, pullRefresh$); autoRefresh$ refresh refresh

    refresh pullRefresh$ refresh clickRefresh$ refresh refresh$ refresh refresh refresh refresh refresh merge ( )
  26. 获得视图响应数据流 3 view$ = this.refresh$.pipe(switchMap(() => this.fetch$)) refresh$ refresh refresh

    refresh refresh refresh fetch fetch$ view view view view view view$ fetch fetch fetch fetch
  27. 获得视图响应数据流 3 view$ = this.refresh$.pipe(switchMap(() => this.fetch$)) refresh$ refresh refresh

    refresh refresh refresh fetch fetch$ view view view view view view$ fetch fetch fetch fetch 消费/订阅数据流:更新视图 4
  28. 获得视图响应数据流 3 view$ = this.refresh$.pipe(switchMap(() => this.fetch$)) refresh$ refresh refresh

    refresh refresh refresh fetch fetch$ view view view view view view$ fetch fetch fetch fetch <div *ngFor="let user of view$ | async" > </div> view$.subscribe(); render() { return () ; } 消费/订阅数据流:更新视图 4
  29. Timer reactionFn Remote API Computer Remote Server User Event Web

    View View = reactionFn(UserEvent | Timer | RemoteApi)
  30. Timer reactionFn Remote API Computer Remote Server User Event Web

    View View = reactionFn(UserEvent | Timer | RemoteApi)
  31. 描述源数据流 1 • User Event: fromEvent • Timer: interval, timer

    • Remote API: fromFetch, webSocket eventStream$ Timer reactionFn Remote API Computer Remote Server User Event Web View View = reactionFn(UserEvent | Timer | RemoteApi)
  32. 描述源数据流 1 • User Event: fromEvent • Timer: interval, timer

    • Remote API: fromFetch, webSocket eventStream$ • COMBINING: merge, combineLatest, zip • MAPPING: map • FILTERING: filter • REDUCING: reduce, max, count, scan • TAKING: take, takeWhile • SKIPPING: skip, skipWhile, takeLast, last • TIME: delay, debounceTime, throttleTime 组合/转换数据流 2 middleStream$/viewStream$ Timer reactionFn Remote API Computer Remote Server User Event Web View View = reactionFn(UserEvent | Timer | RemoteApi)
  33. 描述源数据流 1 • User Event: fromEvent • Timer: interval, timer

    • Remote API: fromFetch, webSocket eventStream$ • COMBINING: merge, combineLatest, zip • MAPPING: map • FILTERING: filter • REDUCING: reduce, max, count, scan • TAKING: take, takeWhile • SKIPPING: skip, skipWhile, takeLast, last • TIME: delay, debounceTime, throttleTime 组合/转换数据流 2 middleStream$/viewStream$ Timer reactionFn Remote API Computer Remote Server User Event Web View View = reactionFn(UserEvent | Timer | RemoteApi) 消费数据流更新数据 3 updateView • subscribe • async pipe
  34. import { Component } from '@angular/core' ; import { EMPTY,

    fromEvent, interval, merge, Subject } from 'rxjs' ; import { debounceTime, filter, map, repeat, startWith, switchMap, take, takeUntil } from 'rxjs/operators' ; import { HttpClient } from '@angular/common/http' ; interface User { address: string ; balance: number ; created: string ; email: string ; first: string ; last: string ; } @Component( { selector: 'app-root' , template: ` <button (click)="click$.next()">Refresh</button > <input id="auto" type="checkbox" ngModel (ngModelChange)="change$.next($event)" / > <label for="auto">Enable Auto Refresh</label > <section *ngFor="let user of view$ | async" > <hr > <div>Name: {{ user.first }} {{ user.last }}</div > <div>Address: {{ user.address }}</div > <div>Balance: {{ user.balance }}</div > <div>Email: {{ user.email }}</div > </section > ` } ) export class AppComponent { click$ = new Subject<void>() ; change$ = new Subject<boolean>() ; touchstart$ = fromEvent<TouchEvent>(document, 'touchstart') ; touchend$ = fromEvent<TouchEvent>(document, 'touchend') ; touchmove$ = fromEvent<TouchEvent>(document, 'touchmove') ; interval$ = interval(5000) ; fetch$ = this.httpClient.get<User[]>('https://randomapi.azurewebsites.net/api/users') ; clickRefresh$ = this.click$.pipe(debounceTime(300)) ; touchRefresh$ = this.touchstart$.pipe ( switchMap(touchstart = > this.touchmove$.pipe ( map(touchmove => touchmove.touches[0].pageY - touchstart.touches[0].pageY) , takeUntil(this.touchend$ ) ) ) , filter(position => position >= 300) , take(1) , repeat( ) ) ; autoRefresh$ = this.change$.pipe(switchMap(enabled => (enabled ? this.interval$ : EMPTY))) ; refresh$ = merge(this.clickRefresh$, this.autoRefresh$, this.touchRefresh$).pipe(startWith(true)) ; view$ = this.refresh$.pipe(switchMap(() => this.fetch$)) ; constructor(private httpClient: HttpClient) { } } https://github.com/vthinkxie/ng-pull-refresh
  35. Model View Model View View Model View MVVM 1. 数据的赋值与收集过程很难精确跟踪

    2. View Model 与 View 之间的逻辑复杂/复⽤困难 1. State 描述中间状态(切⽚),⽽⾮过程 2. Action 与 Event 关系不对应 3. State ⽆法追踪来源 Redux
  36. State1 State2 State3 Redux Action1 Action2 Action3 state$ = action$.scan(reducer)

    Action 是 EventStream 的简化 State 是 Stream 在某个时刻的对应 https://redux.js.org/understanding/history-and-design/prior-art
  37. 事件乱序 先发起请求后收到响应 refresh$ f i rst request second request request$

    view$ fetch fetch second response f i rst response Late Event
  38. 事件时间 Web Interface Computer click Event Time Processing Time processing

    CPU Event Time ≈ Processing time 前端开发中:
  39. 事件乱序 Even Proudcer 1 3 4 5 2 6 Processing

    1 5 2 4 6 3 Late Event Network Transmission
  40. 事件乱序 1 3 4 5 2 6 1 5 2

    4 6 3 Late Event window1 window2 window3 window4 理想情况 实际情况 Error Result
  41. 事件乱序 1 3 4 5 2 6 1 5 2

    4 6 3 Late Event window1 window2 window3 window4 理想情况 实际情况 Error Result ⽅案:window2 等到 late event 到来再计算 问题:等待多久
  42. Stream (in order) 7 W(11) W(20) Watermark 9 9 10

    11 14 15 17 Event Event timestamp 18 20 19 21 23 Watermark:准确性与实时性的折中⽅案
  43. Stream (in order) 7 W(11) W(20) Watermark 9 9 10

    11 14 15 17 Event Event timestamp 18 20 19 21 23 Stream (out of order) 7 W(11) W(17) 11 15 9 12 14 17 12 22 20 17 19 21 Watermark Event Event timestamp Watermark:准确性与实时性的折中⽅案
  44. keyBy()/ window()/ apply() Sink Streaming Dataflow (condensed view) Source map()

    Task Operator chain 定义源数据流 组合/转换数据流 消费数据流 1 2 3
  45. Subtask (= thread) Source [1] map() [1] keyBy()/ window()/ apply()

    [1] Sink [1] Source [2] map() [2] keyBy()/ window()/ apply() [2] Streaming Dataflow (parallelized view) Subtask (= thread) Operator chain
  46. TaskManager Task Slot Task Slot Task Slot Source [1] map()

    [1] keyBy()/ window()/ apply() [1] Sink [1] TaskManager Task Slot Task Slot Task Slot Source [2] map() [2] keyBy()/ window()/ apply() [2] Processes Threads Threads
  47. TaskManager Task Slot Task Slot Task Slot Source [1] map()

    [1] keyBy()/ window()/ apply() [1] Sink [1] TaskManager Task Slot Task Slot Task Slot Source [2] map() [2] keyBy()/ window()/ apply() [2] Processes Threads Threads
  48. TaskManager Task Slot Task Slot Task Slot Source [1] map()

    [1] keyBy()/ window()/ apply() [1] Sink [1] TaskManager Task Slot Task Slot Task Slot Source [2] map() [2] keyBy()/ window()/ apply() [2] Processes Threads Threads
  49. ! 17 Take state snapshot Flink State and Distributed Snapshots

    Stateful! Operation Source „Asynchronous Barrier Snapshotting“ Stable Storage ⾃动存档
  50. State 与 Checkpoint h h 6 Operator 5 4 3

    2 1 f e d c b a z y x Checkpoint barrier Input buffers Output buffers Output buffers Begin alignment 6 Operator 5 4 3 2 1 j i g f e d c b Checkpoint barrier Input buffers Output buffers Output buffers End alignment 6 Operator 5 4 3 2 1 j i g f e d c b Checkpoint barrier Input buffers Output buffers Checkpoint State backend state state state checkpoint barrier n-1 data stream stream record (event) checkpoint barrier n newer records part of checkpoint n-1 part of checkpoint n part of checkpoint n+1 older records 保存中间状态
  51. State 与 Checkpoint Master Source 1: Source 2: Source 3:

    Source 4: State 1: State 2: Sink 1: (pending) Sink 2: (pending) Checkpoint data Current position: 6791 Start checkpoint message Ack. with position 6791 Emit stream barriers Operator received barrier at each input Emits next barrier Sink acknowledges checkpoint after receiving all barriers Writes a snapshot of its state State Backend Current position: 7252 Current position: 5589 Master Source 1: 6791 Source 2: 7252 Source 3: 5589 Source 4: 6843 State 1: State 2: Checkpoint data Sink 1: (pending) Sink 2: (pending) State Backend s1 Master Source 1: 6791 Source 2: 7252 Source 3: 5589 Source 4: 6843 State 1: ptr1 State 2: ptr2 Checkpoint data Sink 1: (pending) Sink 2: (pending) State Backend s1 s2 Ack. with pointer to state Master Source 1: 6791 Source 2: 7252 Source 3: 5589 Source 4: 6843 State 1: ptr1 State 2: ptr2 Checkpoint data Sink 1: ack! Sink 2: ack! State Backend s1 s2 Starting Checkpoint Current position: 6843 Checkpoint in Progress Checkpoint in Progress Checkpoint Completed
  52. State 与 Checkpoint Recovery From Failure ! 26 Stateful! Operation

    Source Stable Storage Resume to checkpoint offset Restore State Restore State ⾃动恢复
  53. Flink Program Client TaskManager Task Slot Task Slot Task Task

    Slot Task Network Manager Actor System Memory & I/O Manager JobManager (Worker) (Master / YARN Application Master) Dataflow Graph Actor System Actor System Deploy/Stop/ Cancel Tasks Trigger Checkpoints Task Status Heartbeats Statistics … … TaskManager Task Slot Task Slot Task Task Slot Task Network Manager Actor System Memory & I/O Manager (Worker) Data Streams Submit job (send dataflow) Cancel / update job Status updates Statistics & results Program code Scheduler Checkpoint Coordinator Optimizer / Graph Builder Dataflow graph Program Dataflow
  54. Flink Program Client TaskManager Task Slot Task Slot Task Task

    Slot Task Network Manager Actor System Memory & I/O Manager JobManager (Worker) (Master / YARN Application Master) Dataflow Graph Actor System Actor System Deploy/Stop/ Cancel Tasks Trigger Checkpoints Task Status Heartbeats Statistics … … TaskManager Task Slot Task Slot Task Task Slot Task Network Manager Actor System Memory & I/O Manager (Worker) Data Streams Submit job (send dataflow) Cancel / update job Status updates Statistics & results Program code Scheduler Checkpoint Coordinator Optimizer / Graph Builder Dataflow graph Program Dataflow