Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leaving the Ivory Tower: Research in the Real World

Leaving the Ivory Tower: Research in the Real World

Academic research often has a reputation of being insular and seldom being used in the real world. At HashiCorp, we've had a long tradition of basing our tools and products on academic research. We look at research for the initial design of products, and for ongoing development of new features. Our industrial research group, HashiCorp Research, has even published novel work. In this talk we cover why we care, how we incorporate research, and what has been particularly useful for us.

Presented at QCon NY.

Armon Dadgar

June 24, 2019

More Decks by Armon Dadgar

Other Decks in Technology


  1. ⁄ Leaving the Ivory Tower: Research in the Real World

  2. Armon Dadgar Co-Founder and CTO at HashiCorp

  3. Copyright © 2018 HashiCorp ⁄ !3 HashiCorp Suite C++ Provision

    Operations Secure
 Security Deploy
 Development Connect
 Networking Private Cloud AWS Azure GCP Common Cloud Operating Model
  4. Research Origins Mitchell Hashimoto Armon Dadgar

  5. Contributing Back

  6. Standing on the Shoulder of Giants Or The Value of

    Research ▪ Discover the “State of the Art” ▪ Relevant works to challenge thinking ▪ Understand fundamental tradeoffs (e.g. FLP Theorem) ▪ Metrics for evaluation
  7. ⁄ Building Consul: A Story of (Service) Discovery

  8. Immutable + Micro-services Front End API Layer Data Layer Immutable

  9. Common Solutions Circa 2012 ▪ Hard Coded IP of Host

    / Virtual IP / Load Balancer ▪ Config Management “Convergence Runs” ▪ Custom Zookeeper based systems
  10. Imagining Solutions API Layer Data Layer Database:3306 API Layer

    Data Layer
  11. Entirely Peer to Peer B C A D

  12. Exploring the Literature Centralized Decentralized Central Servers “Super Peers” Peer

    To Peer
  13. Exploring the Literature Structured Unstructured Rings Spanning Trees Binary Trees

    Adaptive Structure Hybrid Structures Epidemic Broadcast Mesh Network Randomized
  14. Exploring the Literature Limited Visibility Full Visibility Few Members Known

    “Neighbors” Known All Members Known
  15. Imposing Constraints Cloud Datacenter Environment Low Latency and High Bandwidth

    We are operating within a cloud datacenter, where we expect low latencies and high bandwidth, relative to IoT or Internet-wide applications. Few Nodes (< 5K) The operating environment was not large scale peer-to- peer public networks for file sharing, but private infrastructure. The scale is much smaller than some other target environments. Simple To Implement Keep It Simple Stupid (KISS) was a goal. We wanted the simplest possible implementation, and no simpler. Complex protocols are more difficult to implement correctly.
  16. The SWIM Approach

  17. SWIM Properties ▪ Completely Decentralized ▪ Unstructured, with Epidemic Dissemination

    ▪ Full Visibility, All Members Known ▪ Trades more bandwidth use for simplicity and fault tolerance
  18. Closely Considered ▪ Plumtree. Hybrid tree and epidemic style. ▪

    T-Man. Adaptive, can change internal style. ▪ HyParView. Limited view of membership. ▪ Complexity of implementation deemed not worthy ▪ Size of clusters not a concern for full view ▪ Expected traffic minimal
  19. Adaptations Used ▪ Bi-Modal Multicast. Active Push/Pull Synchronization. ▪ Steady

    State vs Recovery Messages. Optimize for efficient distribution in steady state. ▪ Lamport Clocks. Provide a causal relationship between messages. ▪ Vivaldi. Network Coordinates to determine “distance” of peers.
  20. Serf Product (serf.io)

  21. Gossip For Service Discovery B C A D “Web” at

    IP1 “DB” at IP2 “Cache” at IP3 “LB” at IP4
  22. Serf in Practice ▪ (+) Immutable Simplified ▪ (+) Fault

    Tolerant, Easy to Operate ▪ (-) Eventual Consistency ▪ (-) No Key/Value Configuration ▪ (-) No “Central” API or UI
  23. Rethinking Architecture B C A D “Web” at IP1 “DB”

    at IP2 “Cache” at IP3 “LB” at IP4 Server
  24. Central Servers Challenges ▪ High Availability ▪ Durability of State

    ▪ Strong Consistency
  25. Paxos or How Hard is it to Agree?

  26. Paxos Made Simple (?)

  27. Exploring The Literature ▪ Multi Paxos ▪ Egalitarian Paxos ▪

    Fast Paxos ▪ Cheap Paxos ▪ Generalized Paxos
  28. None
  29. Raft or Paxos Made Simple

  30. Consul Product (consul.io) Hybrid CP / AP Design - Strongly

    consistent servers (Raft) - Weekly consistent membership (SWIM) - Centralized API and State - Decentralized Operation
  31. Work Embedded in Consul (and Serf) ▪ Consensus ▪ Gossip

    Protocols ▪ Network Tomography ▪ Capabilities Based Security ▪ Concurrency Control (MVCC) ▪ Lamport / Vector Clocks
  32. Research across Products - Security Systems (Kerberos) - Security Protocols

    - Access Control Systems - Cryptography - Graph Theory - Type Theory - Automata Theory - Scheduler Design (Mesos, Borg, Omega) - Bin Packing - Pre-emption - Consensus - Gossip
  33. ⁄ Forming HashiCorp Research

  34. Industrial Research Group Jon Currey joins as Director of Research

  35. Focus on industrial research, working 18 to 24 months ahead

    of engineering, on novel work. HashiCorp Research Charter
  36. Research Goals Problem Novel Solution Existing Solution Publish Integrate Product

  37. Customer Problem Frontend Backend Internet

  38. Customer Problem Frontend Backend Internet

  39. Research Process Collect Data Make Hypothesis Design Solution Design Experiment

    Validate Hypothesis Validate Solution
  40. Gossip FSM Suspect Healthy Dead Ping Timeout Suspect Timeout Refute

    Dead Refute Suspect
  41. Untimely Processing Suspect Healthy Dead Ping Timeout Suspect Timeout Refute

    Dead Refute Suspect
  42. Reducing Sensitivity Exponential Convergence - Replace Fixed Timers - Use

    Redundant Confirmations - Insight from Bloom Filters, K independent hashes Local Health Awareness - Measure Local Health - Tune sensitivity as health changes Early Notification - Send Suspicion Early - Send Suspicion Redundant - Enable faster refute
  43. Evaluation of Solution

  44. Publishing Lifeguard

  45. Integration with Product

  46. ⁄ Picking the Problem

  47. Vault Audit Logs User Action Audit Log

  48. Vault Anomaly Detector Anomaly Detection Audit Log User Action

  49. Anomaly Detector Unexpected Expected Event Detector Model

  50. Exploring the Literature Few False Negatives Few False Positives Lots

    of false positives Lots of false negatives
  51. Applications to Vault Screen Millions of Events Security Issues Missed

  52. Defining a Model Unexpected Expected Event Detector Model

  53. Refining Configuration Vault Advisor Audit Log User Action Configuration

  54. Vault Advisor in Depth

  55. Research Status Problem Novel Solution Existing Solution Publish Integrate Product

  56. Lifeguard Integration Pull Request Upstream Research Team Project Fork Eng

  57. Product-ization Research Team | Advisor Prototype Eng Team Train Develop

    Publish Research Embedded
  58. What’s Coming Problem Novel Solution Existing Solution Publish Integrate Product

  59. ⁄ Research Culture

  60. Fostering Research Culture ▪ Product / Engineering is 100x bigger

    than Research ▪ Cultural approach needed ▪ Consuming research
  61. Publishing PRD / RFCs

  62. Slack #talk-research

  63. Brown bags and Conferences

  64. Sponsorships & Memberships

  65. Cultural Goals ▪ Build awareness of research ▪ Give access

    to published academic work ▪ Create channels to engage internally ▪ Promote involvement in external community ▪ Involve Research in Engineering, and visa versa
  66. ⁄ Conclusion

  67. Real world value ▪ Leverage the “State of the Art”,

    instead of naive design ▪ Apply domain constraints against fundamental tradeoffs ▪ Improve product performance, security, and usability
  68. Research used from Day 1 ▪ Academic research fundamental to

    HashiCorp Products ▪ Day 1 core designs based on the literature ▪ Day 2+ improvements from literature
  69. HashiCorp Research ▪ Focused on Industrial Research ▪ Publishing work,

    not just consuming ▪ Advocate for research culture internally ▪ Features like Lifeguard ▪ New products like Vault Advisor
  70. Promoting Research ▪ Build a culture around research ▪ Enable

    access, encourage consumption ▪ Create bridges between Research and Engineering ▪ Vocalize the benefits
  71. Thank You www.hashicorp.com