Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leaving the Ivory Tower: Research in the Real World

Leaving the Ivory Tower: Research in the Real World

Academic research often has a reputation of being insular and seldom being used in the real world. At HashiCorp, we've had a long tradition of basing our tools and products on academic research. We look at research for the initial design of products, and for ongoing development of new features. Our industrial research group, HashiCorp Research, has even published novel work. In this talk we cover why we care, how we incorporate research, and what has been particularly useful for us.

Presented at QCon NY.

Armon Dadgar

June 24, 2019
Tweet

More Decks by Armon Dadgar

Other Decks in Technology

Transcript


  1. Leaving the Ivory Tower:
    Research in the Real World

    View full-size slide

  2. Armon Dadgar
    Co-Founder and CTO at HashiCorp

    View full-size slide

  3. Copyright © 2018 HashiCorp ⁄ !3
    HashiCorp Suite
    C++
    Provision

    Operations
    Secure

    Security
    Deploy

    Development
    Connect

    Networking
    Private Cloud AWS Azure GCP
    Common Cloud Operating Model

    View full-size slide

  4. Research
    Origins
    Mitchell Hashimoto Armon Dadgar

    View full-size slide

  5. Contributing Back

    View full-size slide

  6. Standing on the Shoulder of Giants
    Or The Value of Research
    ▪ Discover the “State of the Art”
    ▪ Relevant works to challenge thinking
    ▪ Understand fundamental tradeoffs (e.g. FLP Theorem)
    ▪ Metrics for evaluation

    View full-size slide


  7. Building Consul:
    A Story of (Service)
    Discovery

    View full-size slide

  8. Immutable + Micro-services
    Front End
    API Layer
    Data Layer
    Immutable Artifact

    View full-size slide

  9. Common Solutions
    Circa 2012
    ▪ Hard Coded IP of Host / Virtual IP / Load Balancer
    ▪ Config Management “Convergence Runs”
    ▪ Custom Zookeeper based systems

    View full-size slide

  10. Imagining Solutions
    API Layer
    Data Layer
    Database:3306
    10.0.1.25:3306
    API Layer
    Data Layer
    10.0.1.25:3306

    View full-size slide

  11. Entirely Peer to Peer
    B
    C
    A
    D

    View full-size slide

  12. Exploring the Literature
    Centralized Decentralized
    Central Servers “Super Peers” Peer To Peer

    View full-size slide

  13. Exploring the Literature
    Structured Unstructured
    Rings
    Spanning Trees
    Binary Trees
    Adaptive Structure
    Hybrid Structures
    Epidemic Broadcast
    Mesh Network
    Randomized

    View full-size slide

  14. Exploring the Literature
    Limited
    Visibility
    Full
    Visibility
    Few Members Known “Neighbors” Known All Members Known

    View full-size slide

  15. Imposing Constraints
    Cloud Datacenter Environment
    Low Latency and
    High Bandwidth
    We are operating within a
    cloud datacenter, where we
    expect low latencies and high
    bandwidth, relative to IoT or
    Internet-wide applications.
    Few Nodes (< 5K)
    The operating environment
    was not large scale peer-to-
    peer public networks for file
    sharing, but private
    infrastructure. The scale is
    much smaller than some other
    target environments.
    Simple To Implement
    Keep It Simple Stupid (KISS)
    was a goal. We wanted the
    simplest possible
    implementation, and no
    simpler. Complex protocols
    are more difficult to implement
    correctly.

    View full-size slide

  16. The SWIM Approach

    View full-size slide

  17. SWIM Properties
    ▪ Completely Decentralized
    ▪ Unstructured, with Epidemic Dissemination
    ▪ Full Visibility, All Members Known
    ▪ Trades more bandwidth use for simplicity and fault tolerance

    View full-size slide

  18. Closely Considered
    ▪ Plumtree. Hybrid tree and epidemic style.
    ▪ T-Man. Adaptive, can change internal style.
    ▪ HyParView. Limited view of membership.
    ▪ Complexity of implementation deemed not worthy
    ▪ Size of clusters not a concern for full view
    ▪ Expected traffic minimal

    View full-size slide

  19. Adaptations Used
    ▪ Bi-Modal Multicast. Active Push/Pull Synchronization.
    ▪ Steady State vs Recovery Messages. Optimize for efficient distribution
    in steady state.
    ▪ Lamport Clocks. Provide a causal relationship between messages.
    ▪ Vivaldi. Network Coordinates to determine “distance” of peers.

    View full-size slide

  20. Serf Product (serf.io)

    View full-size slide

  21. Gossip For Service Discovery
    B
    C
    A
    D
    “Web” at IP1
    “DB” at IP2
    “Cache” at IP3
    “LB” at IP4

    View full-size slide

  22. Serf in Practice
    ▪ (+) Immutable Simplified
    ▪ (+) Fault Tolerant, Easy to Operate
    ▪ (-) Eventual Consistency
    ▪ (-) No Key/Value Configuration
    ▪ (-) No “Central” API or UI

    View full-size slide

  23. Rethinking Architecture
    B C
    A D
    “Web” at IP1 “DB” at IP2 “Cache” at IP3
    “LB” at IP4
    Server

    View full-size slide

  24. Central Servers Challenges
    ▪ High Availability
    ▪ Durability of State
    ▪ Strong Consistency

    View full-size slide

  25. Paxos or How Hard is it to Agree?

    View full-size slide

  26. Paxos Made Simple (?)

    View full-size slide

  27. Exploring The Literature
    ▪ Multi Paxos
    ▪ Egalitarian Paxos
    ▪ Fast Paxos
    ▪ Cheap Paxos
    ▪ Generalized Paxos

    View full-size slide

  28. Raft or Paxos Made Simple

    View full-size slide

  29. Consul Product (consul.io)
    Hybrid CP / AP Design
    - Strongly consistent servers (Raft)
    - Weekly consistent membership (SWIM)
    - Centralized API and State
    - Decentralized Operation

    View full-size slide

  30. Work Embedded in Consul (and Serf)
    ▪ Consensus
    ▪ Gossip Protocols
    ▪ Network Tomography
    ▪ Capabilities Based Security
    ▪ Concurrency Control (MVCC)
    ▪ Lamport / Vector Clocks

    View full-size slide

  31. Research across Products
    - Security Systems (Kerberos)
    - Security Protocols
    - Access Control Systems
    - Cryptography
    - Graph Theory
    - Type Theory
    - Automata Theory
    - Scheduler Design (Mesos,
    Borg, Omega)
    - Bin Packing
    - Pre-emption
    - Consensus
    - Gossip

    View full-size slide


  32. Forming HashiCorp
    Research

    View full-size slide

  33. Industrial Research Group
    Jon Currey joins as Director of Research

    View full-size slide

  34. Focus on industrial research,
    working 18 to 24 months ahead of
    engineering, on novel work.
    HashiCorp Research Charter

    View full-size slide

  35. Research Goals
    Problem Novel Solution
    Existing
    Solution
    Publish
    Integrate
    Product

    View full-size slide

  36. Customer Problem
    Frontend Backend
    Internet

    View full-size slide

  37. Customer Problem
    Frontend Backend
    Internet

    View full-size slide

  38. Research Process
    Collect Data
    Make
    Hypothesis
    Design
    Solution
    Design
    Experiment
    Validate
    Hypothesis
    Validate
    Solution

    View full-size slide

  39. Gossip FSM
    Suspect
    Healthy
    Dead
    Ping Timeout
    Suspect Timeout
    Refute Dead
    Refute Suspect

    View full-size slide

  40. Untimely Processing
    Suspect
    Healthy
    Dead
    Ping Timeout
    Suspect Timeout
    Refute Dead
    Refute Suspect

    View full-size slide

  41. Reducing Sensitivity
    Exponential
    Convergence
    - Replace Fixed Timers
    - Use Redundant
    Confirmations
    - Insight from Bloom Filters, K
    independent hashes
    Local Health
    Awareness
    - Measure Local Health
    - Tune sensitivity as health
    changes
    Early Notification
    - Send Suspicion Early
    - Send Suspicion Redundant
    - Enable faster refute

    View full-size slide

  42. Evaluation of Solution

    View full-size slide

  43. Publishing Lifeguard

    View full-size slide

  44. Integration with Product

    View full-size slide


  45. Picking the Problem

    View full-size slide

  46. Vault Audit Logs
    User Action Audit Log

    View full-size slide

  47. Vault Anomaly Detector
    Anomaly Detection
    Audit Log
    User Action

    View full-size slide

  48. Anomaly Detector
    Unexpected
    Expected
    Event Detector Model

    View full-size slide

  49. Exploring the Literature
    Few False
    Negatives
    Few False
    Positives
    Lots of false positives Lots of false negatives

    View full-size slide

  50. Applications to Vault
    Screen Millions
    of Events
    Security Issues
    Missed

    View full-size slide

  51. Defining a Model
    Unexpected
    Expected
    Event Detector Model

    View full-size slide

  52. Refining Configuration
    Vault Advisor
    Audit Log
    User Action
    Configuration

    View full-size slide

  53. Vault Advisor in Depth

    View full-size slide

  54. Research Status
    Problem Novel Solution
    Existing
    Solution
    Publish
    Integrate
    Product

    View full-size slide

  55. Lifeguard Integration
    Pull Request
    Upstream
    Research Team Project Fork Eng Team

    View full-size slide

  56. Product-ization
    Research Team
    | Advisor
    Prototype
    Eng Team Train Develop
    Publish
    Research
    Embedded

    View full-size slide

  57. What’s Coming
    Problem Novel Solution
    Existing
    Solution
    Publish
    Integrate
    Product

    View full-size slide


  58. Research Culture

    View full-size slide

  59. Fostering Research Culture
    ▪ Product / Engineering is 100x bigger than Research
    ▪ Cultural approach needed
    ▪ Consuming research

    View full-size slide

  60. Publishing PRD / RFCs

    View full-size slide

  61. Slack #talk-research

    View full-size slide

  62. Brown bags and Conferences

    View full-size slide

  63. Sponsorships & Memberships

    View full-size slide

  64. Cultural Goals
    ▪ Build awareness of research
    ▪ Give access to published academic work
    ▪ Create channels to engage internally
    ▪ Promote involvement in external community
    ▪ Involve Research in Engineering, and visa versa

    View full-size slide


  65. Conclusion

    View full-size slide

  66. Real world value
    ▪ Leverage the “State of the Art”, instead of naive design
    ▪ Apply domain constraints against fundamental tradeoffs
    ▪ Improve product performance, security, and usability

    View full-size slide

  67. Research used from Day 1
    ▪ Academic research fundamental to HashiCorp Products
    ▪ Day 1 core designs based on the literature
    ▪ Day 2+ improvements from literature

    View full-size slide

  68. HashiCorp Research
    ▪ Focused on Industrial Research
    ▪ Publishing work, not just consuming
    ▪ Advocate for research culture internally
    ▪ Features like Lifeguard
    ▪ New products like Vault Advisor

    View full-size slide

  69. Promoting Research
    ▪ Build a culture around research
    ▪ Enable access, encourage consumption
    ▪ Create bridges between Research and Engineering
    ▪ Vocalize the benefits

    View full-size slide

  70. Thank You
    www.hashicorp.com

    View full-size slide