Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The future of data system

Wataru Hirota
October 23, 2019
62

The future of data system

original book: Designing Data-Intensive Applications

Wataru Hirota

October 23, 2019
Tweet

Transcript

  1. How to design softwares in the presence of bugs? Bugs

    are inevitable in everywhere. - software - hardware - data process - outer services Even a bug less likely to happen may happen - e.g. hardware bugs such as memory write error
  2. Trust, but Verify We basically trust the system (e.g. hardware),

    but we always need to verify it. - Don’t trust a system blindly - e.g. ACID of Amazon S3 perhaps fail
  3. How to verify softwares? End-to-end verification is necessary - It

    includes all the components - software, hardware, networking, UI, … Of course, end-to-end approach is not sufficient - We have to stop damages in early step
  4. A key of verification: Auditability If a component is deterministic

    and well-defined, it has better auditability - e.g. Event-based systems - Every event is immutable - The order is deterministic
  5. Blockchain meets audit Blockchain shares the same ideas with distributed

    system - Distributed - Each component is untrusted - After checking transactions in each component, it uses a consensus protocol to agree on the transactions … but now consensus protocol (mining) is too slow
  6. TL; DR - Don't forget the users are also humans

    - We have responsibility to carefully consider the consequences
  7. e.g. Bias and discrimination (The following items are ethic problem;

    we don’t have a correct answer) Algorithms may also affected by biases - e.g. Even IP addresses could be a strong factor to predict races
  8. Lots to think about - Responsibility - if self-driving car

    makes a mistake. Who is responsible for it? - Feedback loops - A mature recommendation system may end up showing people only opinions they already agree with. - Privacy and Tracking - Data helps for personalization, but...
  9. Summary - No one single tool that can efficiently serve

    all possible use cases. - Data integration is a solution for it. - Integrity support is scalable - even in asynchronous ways - Bugs are in everywhere, so trust but verify - Be Ethical