Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leaving the Ivory Tower: Research in the Real World

Leaving the Ivory Tower: Research in the Real World

Academic research often has a reputation of being insular and seldom being used in the real world. At HashiCorp, we've had a long tradition of basing our tools and products on academic research. We look at research for the initial design of products, and for ongoing development of new features. Our industrial research group, HashiCorp Research, has even published novel work. In this talk we cover why we care, how we incorporate research, and what has been particularly useful for us.

Presented at QCon NY.

Armon Dadgar

June 24, 2019
Tweet

More Decks by Armon Dadgar

Other Decks in Technology

Transcript


  1. Leaving the Ivory Tower:
    Research in the Real World

    View Slide

  2. Armon Dadgar
    Co-Founder and CTO at HashiCorp

    View Slide

  3. Copyright © 2018 HashiCorp ⁄ !3
    HashiCorp Suite
    C++
    Provision

    Operations
    Secure

    Security
    Deploy

    Development
    Connect

    Networking
    Private Cloud AWS Azure GCP
    Common Cloud Operating Model

    View Slide

  4. Research
    Origins
    Mitchell Hashimoto Armon Dadgar

    View Slide

  5. Contributing Back

    View Slide

  6. Standing on the Shoulder of Giants
    Or The Value of Research
    ▪ Discover the “State of the Art”
    ▪ Relevant works to challenge thinking
    ▪ Understand fundamental tradeoffs (e.g. FLP Theorem)
    ▪ Metrics for evaluation

    View Slide


  7. Building Consul:
    A Story of (Service)
    Discovery

    View Slide

  8. Immutable + Micro-services
    Front End
    API Layer
    Data Layer
    Immutable Artifact

    View Slide

  9. Common Solutions
    Circa 2012
    ▪ Hard Coded IP of Host / Virtual IP / Load Balancer
    ▪ Config Management “Convergence Runs”
    ▪ Custom Zookeeper based systems

    View Slide

  10. Imagining Solutions
    API Layer
    Data Layer
    Database:3306
    10.0.1.25:3306
    API Layer
    Data Layer
    10.0.1.25:3306

    View Slide

  11. Entirely Peer to Peer
    B
    C
    A
    D

    View Slide

  12. Exploring the Literature
    Centralized Decentralized
    Central Servers “Super Peers” Peer To Peer

    View Slide

  13. Exploring the Literature
    Structured Unstructured
    Rings
    Spanning Trees
    Binary Trees
    Adaptive Structure
    Hybrid Structures
    Epidemic Broadcast
    Mesh Network
    Randomized

    View Slide

  14. Exploring the Literature
    Limited
    Visibility
    Full
    Visibility
    Few Members Known “Neighbors” Known All Members Known

    View Slide

  15. Imposing Constraints
    Cloud Datacenter Environment
    Low Latency and
    High Bandwidth
    We are operating within a
    cloud datacenter, where we
    expect low latencies and high
    bandwidth, relative to IoT or
    Internet-wide applications.
    Few Nodes (< 5K)
    The operating environment
    was not large scale peer-to-
    peer public networks for file
    sharing, but private
    infrastructure. The scale is
    much smaller than some other
    target environments.
    Simple To Implement
    Keep It Simple Stupid (KISS)
    was a goal. We wanted the
    simplest possible
    implementation, and no
    simpler. Complex protocols
    are more difficult to implement
    correctly.

    View Slide

  16. The SWIM Approach

    View Slide

  17. SWIM Properties
    ▪ Completely Decentralized
    ▪ Unstructured, with Epidemic Dissemination
    ▪ Full Visibility, All Members Known
    ▪ Trades more bandwidth use for simplicity and fault tolerance

    View Slide

  18. Closely Considered
    ▪ Plumtree. Hybrid tree and epidemic style.
    ▪ T-Man. Adaptive, can change internal style.
    ▪ HyParView. Limited view of membership.
    ▪ Complexity of implementation deemed not worthy
    ▪ Size of clusters not a concern for full view
    ▪ Expected traffic minimal

    View Slide

  19. Adaptations Used
    ▪ Bi-Modal Multicast. Active Push/Pull Synchronization.
    ▪ Steady State vs Recovery Messages. Optimize for efficient distribution
    in steady state.
    ▪ Lamport Clocks. Provide a causal relationship between messages.
    ▪ Vivaldi. Network Coordinates to determine “distance” of peers.

    View Slide

  20. Serf Product (serf.io)

    View Slide

  21. Gossip For Service Discovery
    B
    C
    A
    D
    “Web” at IP1
    “DB” at IP2
    “Cache” at IP3
    “LB” at IP4

    View Slide

  22. Serf in Practice
    ▪ (+) Immutable Simplified
    ▪ (+) Fault Tolerant, Easy to Operate
    ▪ (-) Eventual Consistency
    ▪ (-) No Key/Value Configuration
    ▪ (-) No “Central” API or UI

    View Slide

  23. Rethinking Architecture
    B C
    A D
    “Web” at IP1 “DB” at IP2 “Cache” at IP3
    “LB” at IP4
    Server

    View Slide

  24. Central Servers Challenges
    ▪ High Availability
    ▪ Durability of State
    ▪ Strong Consistency

    View Slide

  25. Paxos or How Hard is it to Agree?

    View Slide

  26. Paxos Made Simple (?)

    View Slide

  27. Exploring The Literature
    ▪ Multi Paxos
    ▪ Egalitarian Paxos
    ▪ Fast Paxos
    ▪ Cheap Paxos
    ▪ Generalized Paxos

    View Slide

  28. View Slide

  29. Raft or Paxos Made Simple

    View Slide

  30. Consul Product (consul.io)
    Hybrid CP / AP Design
    - Strongly consistent servers (Raft)
    - Weekly consistent membership (SWIM)
    - Centralized API and State
    - Decentralized Operation

    View Slide

  31. Work Embedded in Consul (and Serf)
    ▪ Consensus
    ▪ Gossip Protocols
    ▪ Network Tomography
    ▪ Capabilities Based Security
    ▪ Concurrency Control (MVCC)
    ▪ Lamport / Vector Clocks

    View Slide

  32. Research across Products
    - Security Systems (Kerberos)
    - Security Protocols
    - Access Control Systems
    - Cryptography
    - Graph Theory
    - Type Theory
    - Automata Theory
    - Scheduler Design (Mesos,
    Borg, Omega)
    - Bin Packing
    - Pre-emption
    - Consensus
    - Gossip

    View Slide


  33. Forming HashiCorp
    Research

    View Slide

  34. Industrial Research Group
    Jon Currey joins as Director of Research

    View Slide

  35. Focus on industrial research,
    working 18 to 24 months ahead of
    engineering, on novel work.
    HashiCorp Research Charter

    View Slide

  36. Research Goals
    Problem Novel Solution
    Existing
    Solution
    Publish
    Integrate
    Product

    View Slide

  37. Customer Problem
    Frontend Backend
    Internet

    View Slide

  38. Customer Problem
    Frontend Backend
    Internet

    View Slide

  39. Research Process
    Collect Data
    Make
    Hypothesis
    Design
    Solution
    Design
    Experiment
    Validate
    Hypothesis
    Validate
    Solution

    View Slide

  40. Gossip FSM
    Suspect
    Healthy
    Dead
    Ping Timeout
    Suspect Timeout
    Refute Dead
    Refute Suspect

    View Slide

  41. Untimely Processing
    Suspect
    Healthy
    Dead
    Ping Timeout
    Suspect Timeout
    Refute Dead
    Refute Suspect

    View Slide

  42. Reducing Sensitivity
    Exponential
    Convergence
    - Replace Fixed Timers
    - Use Redundant
    Confirmations
    - Insight from Bloom Filters, K
    independent hashes
    Local Health
    Awareness
    - Measure Local Health
    - Tune sensitivity as health
    changes
    Early Notification
    - Send Suspicion Early
    - Send Suspicion Redundant
    - Enable faster refute

    View Slide

  43. Evaluation of Solution

    View Slide

  44. Publishing Lifeguard

    View Slide

  45. Integration with Product

    View Slide


  46. Picking the Problem

    View Slide

  47. Vault Audit Logs
    User Action Audit Log

    View Slide

  48. Vault Anomaly Detector
    Anomaly Detection
    Audit Log
    User Action

    View Slide

  49. Anomaly Detector
    Unexpected
    Expected
    Event Detector Model

    View Slide

  50. Exploring the Literature
    Few False
    Negatives
    Few False
    Positives
    Lots of false positives Lots of false negatives

    View Slide

  51. Applications to Vault
    Screen Millions
    of Events
    Security Issues
    Missed

    View Slide

  52. Defining a Model
    Unexpected
    Expected
    Event Detector Model

    View Slide

  53. Refining Configuration
    Vault Advisor
    Audit Log
    User Action
    Configuration

    View Slide

  54. Vault Advisor in Depth

    View Slide

  55. Research Status
    Problem Novel Solution
    Existing
    Solution
    Publish
    Integrate
    Product

    View Slide

  56. Lifeguard Integration
    Pull Request
    Upstream
    Research Team Project Fork Eng Team

    View Slide

  57. Product-ization
    Research Team
    | Advisor
    Prototype
    Eng Team Train Develop
    Publish
    Research
    Embedded

    View Slide

  58. What’s Coming
    Problem Novel Solution
    Existing
    Solution
    Publish
    Integrate
    Product

    View Slide


  59. Research Culture

    View Slide

  60. Fostering Research Culture
    ▪ Product / Engineering is 100x bigger than Research
    ▪ Cultural approach needed
    ▪ Consuming research

    View Slide

  61. Publishing PRD / RFCs

    View Slide

  62. Slack #talk-research

    View Slide

  63. Brown bags and Conferences

    View Slide

  64. Sponsorships & Memberships

    View Slide

  65. Cultural Goals
    ▪ Build awareness of research
    ▪ Give access to published academic work
    ▪ Create channels to engage internally
    ▪ Promote involvement in external community
    ▪ Involve Research in Engineering, and visa versa

    View Slide


  66. Conclusion

    View Slide

  67. Real world value
    ▪ Leverage the “State of the Art”, instead of naive design
    ▪ Apply domain constraints against fundamental tradeoffs
    ▪ Improve product performance, security, and usability

    View Slide

  68. Research used from Day 1
    ▪ Academic research fundamental to HashiCorp Products
    ▪ Day 1 core designs based on the literature
    ▪ Day 2+ improvements from literature

    View Slide

  69. HashiCorp Research
    ▪ Focused on Industrial Research
    ▪ Publishing work, not just consuming
    ▪ Advocate for research culture internally
    ▪ Features like Lifeguard
    ▪ New products like Vault Advisor

    View Slide

  70. Promoting Research
    ▪ Build a culture around research
    ▪ Enable access, encourage consumption
    ▪ Create bridges between Research and Engineering
    ▪ Vocalize the benefits

    View Slide

  71. Thank You
    www.hashicorp.com

    View Slide