Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Yrs: what have we learned

Yrs: what have we learned

This talk will be about Yrs: a Rust implementation of Yjs (one of the most popular libraries for building collaborative applications) that by now has over half of a dozen bindings to different programming languages. We'll share our experience in helping others build their products: what are the most common mistakes and how to develop a mindset for building a successful local-first applications.

Bartosz Sypytkowski

December 13, 2024
Tweet

More Decks by Bartosz Sypytkowski

Other Decks in Programming

Transcript

  1. YJS import * as Y from 'yjs' const alice =

    new Y.Doc() const bob = new Y.Doc() const aliceText = alice.getText(‘demo') aliceText.insert(0, 'hello') const bobSv = Y.encodeStateVector(bob) const aliceUpdate = Y.encodeStateAsUpdate(alice, bobSv) Y.applyUpdate(bob, aliceUpdate) const bobText = bob.getText(‘demo') bobText.toString() // => hello use yrs::*; use yrs::updates::decoder::Decode; let alice = Doc::new(); let bob = Doc::new(); let mut alice_tx = alice.transact_mut(); let alice_text = alice_tx.get_or_insert_text("demo"); alice_text.insert(&mut alice_tx, 0, "hello"); let mut bob_tx = bob.transact_mut(); let bob_sv = bob_tx.state_vector(); let update = alice_tx.encode_state_as_update_v1(&bob_sv); bob_tx.apply_update(&Update::decode_v1(&update)?)?; let bob_text = bob_tx.get_or_insert_text("demo"); bob_text.get_string(&bob_tx) // => "hello"
  2. YRS PROJECTS yjs (13.6) yrs (0.21) ywasm (0.18) yffi (0.21)

    y-rb (0.5) y-py (0.6) ydotnet (0.4) yswift (0.2) y_ex (0.6) YText: insert/delete YText: formatting attributes and deltas YText: embeded elements YMap: update/delete YMap: weak links (weak-links branch) YArray: insert/delete YArray & YText quotations (weak links branch) YArray: move (move branch) XML Element, Fragment and Text Sub-documents Shared collections: observers Shared collections: recursive nesting Document observers Transaction: origins Snapshots Sticky indexes Undo Manager Awareness
  3. DIFFERENT TYPES OF SHARING 1. Private 2. Shared (but not

    collaborative) 3. Simple collaboration schemes 4. Advanced collaboration schemes
  4. NON-SEQUENTIAL SEQUENTIAL let doc = Doc::new(); let mut tx =

    doc.transact_mut(); let map = tx.get_or_insert_map("demo"); for i in 0..100_000 { map.insert(&mut tx, "A", i); } for i in 0..100_000 { map.insert(&mut tx, "B", i); } drop(tx); doc.transact() .encode_state_as_update_v1(&StateVector::default()) Update size (v1): 77 bytes let doc = Doc::new(); let mut tx = doc.transact_mut(); let map = tx.get_or_insert_map("demo"); for i in 0..100_000 { map.insert(&mut tx, "A", i); map.insert(&mut tx, "B", i); } drop(tx); doc.transact() .encode_state_as_update_v1(&StateVector::default()) Update size (v1): 1 983 517 bytes Update size (v2): 74bytes Update size (v2): 79 bytes
  5. SEQUENTIAL UNDER THE HOOD “demo” MapRef “A” 1 N:0 ClientID

    Clock range Value demo.insert(&mut tx, "A", 1);
  6. SEQUENTIAL UNDER THE HOOD “demo” MapRef “A” N:0 2 N:1

    Tombstone demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, "A", 2);
  7. SEQUENTIAL UNDER THE HOOD “demo” MapRef “A” N:0 N:1 3

    N:2 Clock form consistent range demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, "A", 2); demo.insert(&mut tx, "A", 3);
  8. SEQUENTIAL UNDER THE HOOD “demo” MapRef “A” N:0..1 3 N:2

    demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, "A", 2); demo.insert(&mut tx, "A", 3);
  9. NONSEQUENTIAL UNDER THE HOOD “demo” MapRef “A” 1 N:0 “B”

    1 N:1 demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, “B", 1);
  10. NONSEQUENTIAL UNDER THE HOOD “demo” MapRef “A” “B” 1 N:1

    N:0 2 N:2 demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, “B", 1); demo.insert(&mut tx, "A", 2);
  11. NONSEQUENTIAL UNDER THE HOOD “demo” MapRef “A” “B” N:0 2

    N:2 N:1 2 N:3 demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, “B", 1); demo.insert(&mut tx, "A", 2); demo.insert(&mut tx, “B", 2);
  12. NONSEQUENTIAL UNDER THE HOOD “demo” MapRef “A” “B” N:1 2

    N:3 N:0 N:2 3 N:4 Range is not-consistent (missing 1) demo.insert(&mut tx, "A", 1); demo.insert(&mut tx, “B", 1); demo.insert(&mut tx, "A", 2); demo.insert(&mut tx, “B", 2); demo.insert(&mut tx, “A", 3);
  13. RUNNING OUT OF TIME “[…] Regarding the u64—since the entire

    game state in Formabble is managed via CRDTs, it’s entirely possible that we could exceed the u32 ceiling over time. […]“ https://github.com/y-crdt/y-crdt/issues/496
  14. WHEN CONFLICT RESOLUTION GOES WRONG MapRef { “vehicle_id”: “cf0eeba6” “position”:

    MapRef { “lat”: 52.516, “lon”: 13.379 } } Alice MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 52.516, “lon”: 13.379 } } Bob
  15. WHEN CONFLICT RESOLUTION GOES WRONG MapRef { “vehicle_id”: “cf0eeba6” “position”:

    MapRef { “lat”: 52.516, “lon”: 13.379 } } Alice MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 52.516, “lon”: 13.379 } } Bob MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 51.320, “lon”: 12.411 } } MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 49.473, “lon”: 11.013 } } Concurrent changes
  16. WHEN CONFLICT RESOLUTION GOES WRONG MapRef { “vehicle_id”: “cf0eeba6” “position”:

    MapRef { “lat”: 52.516, “lon”: 13.379 } } Alice MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 52.516, “lon”: 13.379 } } Bob MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 51.320, “lon”: 12.411 } } MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 49.473, “lon”: 11.013 } } Concurrent changes MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 51.320, “lon”: 11.013 } } MapRef { “vehicle_id”: “cf0eeba6” “position”: MapRef { “lat”: 51.320, “lon”: 11.013 } } Apply updates
  17. MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 52.516, “lon”: 13.379

    } } Alice MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 52.516, “lon”: 13.379 } } Bob USE ATOMIC VALUES Entire object is counted as a single value
  18. MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 52.516, “lon”: 13.379

    } } Alice MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 52.516, “lon”: 13.379 } } Bob MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 51.320, “lon”: 12.411 } } MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 49.473, “lon”: 11.013 } } Concurrent changes USE ATOMIC VALUES
  19. MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 52.516, “lon”: 13.379

    } } Alice MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 52.516, “lon”: 13.379 } } Bob MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 51.320, “lon”: 12.411 } } MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 49.473, “lon”: 11.013 } } Concurrent changes MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 51.320, “lon”: 11.013 } } MapRef { “vehicle_id”: “cf0eeba6” “position”: { “lat”: 49.473, “lon”: 12.411 } } Apply updates USE ATOMIC VALUES
  20. { entries: ArrayRef [ MapRef { id: “A”, parent: null,

    children: ArrayRef [] }, MapRef { id: “B”, parent: null, children: ArrayRef [] }, MapRef { id: “C”, parent: null, children: ArrayRef [] } ] } AVOID DATA REDUNDANCY
  21. { entries: ArrayRef [ MapRef { id: “A”, parent: null,

    children: ArrayRef [] }, MapRef { id: “B”, parent: null, children: ArrayRef [] }, MapRef { id: “C”, parent: null, children: ArrayRef [] } ] } AVOID DATA REDUNDANCY Make B child of C Make B child of A
  22. { entries: ArrayRef [ MapRef { id: “A”, parent: null,

    children: ArrayRef [“B”] }, MapRef { id: “B”, parent: “A”|“C”, children: ArrayRef [] }, MapRef { id: “C”, parent: null, children: ArrayRef [“B”] } ] } AVOID DATA REDUNDANCY Make B child of C Make B child of A
  23. { entries: ArrayRef [ MapRef { id: “A”, parent: null,

    children: ArrayRef [“B”] }, MapRef { id: “B”, parent: “A”|“C”, children: ArrayRef [] }, MapRef { id: “C”, parent: null, children: ArrayRef [“B”] } ] } AVOID DATA REDUNDANCY Either A or C Both A and C consider B their child Problem: conflict resolution works in scope of the same collection, independently from others.
  24. { entries: ArrayRef [ MapRef { id: “A”, parent: null,

    children as `$.entries[?(@.parent==‘A’)].id` }, MapRef { id: “B”, parent: “A”|“C”, children as `$.entries[?(@.parent==‘B’)].id` }, MapRef { id: “C”, parent: null, children as `$.entries[?(@.parent==‘C’)].id` } ] } Recompute property value lazily or after update AVOID REDUNDANT AND DEPENDENT DATA
  25. REPAIR • Git merge conflict resolution+ • CRDT conflict resolution

    is always automatic • After applying update go over data • Detect and repair conflicts from business logic PoV • Provide consistent default behavior
  26. TUNNING NETWORK PROTOCOL 1. Default Yjs network providers don’t multiplex

    connection over multiple documents. 2. JavaScript has limited support for compression.
  27. CUSTOM Y-SYNC PROTOCOL EXTENSIONS impl Protocol for MyProtocol { fn

    missing_handle(&self, awareness: &Awareness, tag: u8, data: Vec<u8>) -> Result<Option<Message>, Error> { const IS_MULTIPLEXED: u8 = 0b1000_0000; const IS_ZSTD_COMPRESSED: u8 = 0b0100_0000; let is_multiplexed = tag & IS_MULTIPLEXED == IS_MULTIPLEXED; let is_compressed = tag & IS_ZSTD_COMPRESSED == IS_ZSTD_COMPRESSED; let data = if is_compressed { zstd::decode_all(&data[..])? } else { data }; let doc = if !is_multiplexed { awareness.doc() } else { let uuid: Uuid = data[0..16].try_into()?; // ... load document by uuid }; let tag = tag & 0b0011_1111; // .. apply decompressed message to document } }
  28. CRDT BENCHMARKS A B C D Time to execute ops

    one by one 5,714 ms 28,675 ms 3,089 ms 14,326 ms Time to serialize document state 11 ms 3 ms 77 ms 185 ms Serialized payload size 159,929 bytes 159,929 bytes 258,228 bytes 129,116 bytes Deserializing payload and applying it to another peer 39 ms 16 ms 13 ms 1,805 ms Applying 250 000 operations
  29. CRDT BENCHMARKS A B C D Time to execute ops

    one by one 5,714 ms 28,675 ms 3,089 ms 14,326 ms Time to serialize document state 11 ms 3 ms 77 ms 185 ms Serialized payload size 159,929 bytes 159,929 bytes 258,228 bytes 129,116 bytes Deserializing payload and applying it to another peer 39 ms 16 ms 13 ms 1,805 ms Applying 250 000 operations Bounded by single user actions
  30. CRDT BENCHMARKS A B C D Time to execute ops

    one by one 5,714 ms 28,675 ms 3,089 ms 14,326 ms Time to serialize document state 11 ms 3 ms 77 ms 185 ms Serialized payload size 159,929 bytes 159,929 bytes 258,228 bytes 129,116 bytes Deserializing payload and applying it to another peer 39 ms 16 ms 13 ms 1,805 ms Applying 250 000 operations What affect the network consumption
  31. CRDT BENCHMARKS A B C D Time to execute ops

    one by one 5,714 ms 28,675 ms 3,089 ms 14,326 ms Time to serialize document state 11 ms 3 ms 77 ms 185 ms Serialized payload size 159,929 bytes 159,929 bytes 258,228 bytes 129,116 bytes Deserializing payload and applying it to another peer 39 ms 16 ms 13 ms 1,805 ms Applying 250 000 operations Scales with number of readers
  32. CRDT BENCHMARKS A B C D Time to execute ops

    one by one 5,714 ms 28,675 ms 3,089 ms 14,326 ms Time to serialize document state 11 ms 3 ms 77 ms 185 ms Serialized payload size 159,929 bytes 159,929 bytes 258,228 bytes 129,116 bytes Deserializing payload and applying it to another peer 39 ms 16 ms 13 ms 1,805 ms Applying 250 000 operations Scales with number of writers
  33. COLLABORATION ONE SIZE DOES NOT FIT ALL 1. Indexing 2.

    Live Queries / Notifications 3. File Sharing 4. Partial Data Replication 5. Authorization 6. Authentication
  34.  Yrs - overview of major concepts: https://docs.rs/yrs/latest/yrs/  Yrs

    under the hood: https://www.bartoszsypytkowski.com/yrs-architecture/  Designing CSV tables with Yrs: https://www.bartoszsypytkowski.com/yrs-csv-table/ REFERENCES