Save 37% off PRO during our Black Friday Sale! »

Elasticsearch For Big Enterprise SaaS

Elasticsearch For Big Enterprise SaaS

In this talk we'll discuss adapting Elasticsearch for use in a large multi-tenant enterprise SaaS application. We'll talk about an approach for scaling indices to handle thousands of tenants with shard routing that avoids explosions in cluster state size. We'll also discuss extending Elasticsearch to add encryption-at-rest using Lucene and translog encryption. We'll show Workday's recently-released 'escalar' library for working with Elasticsearch from Scala, including the beginnings of a Scala DSL for Elasticsearch's query language. Finally, we'll see how these three problems connect to three core concerns for managing enterprise data at scale: managing complexity; security; safety and quality.

Thomas Kim is soon to be an engineer at Iterable. He has been working in enterprise software and SaaS for over 15 years. He was formerly a tech lead on Workday Search. Prior to that, he was the CTO of a small BI startup and an early engineer at Salesforce. He loves dogs, snowboarding, and statically typed functional programming. Being a bandwagon Warriors fan makes his wife laugh.

https://www.meetup.com/Elasticsearch-San-Francisco/events/238929112/

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

April 25, 2017
Tweet

Transcript

  1. Elasticsearch For Big Enterprise SaaS Thomas Kim 4/19/2017

  2. Three Problems 1. Multi-tenant Elasticsearch 2. Encryption-at-rest 3. Using ES

    from Scala Managing complexity App security Safety & maintainability
  3. Elasticsearch Arch Review Idx A: Shard 1 (R) Idx A:

    Shard 1 (P) Idx A: Shard 2 (P) Idx A: Shard 2 (R) Idx A: Shard 3 (R) Idx A: Shard 3 (P) Idx A: Shard 4 (P) Idx A: Shard 4 (R) Idx B: Shard 1 (P) Idx B: Shard 1 (R) Idx B: Shard 2 (P) Idx B: Shard 2 (R)
  4. Many possible approaches: - One tenant => One index Multitenancy

    Tenant A Tenant B Tenant C
  5. Multitenancy Cluster state curl -XGET 'http://localhost:9200/_cluster/state'

  6. Many possible approaches: - One tenant => One index; Multiple

    clusters Multitenancy Tenant A Tenant B Tenant C Tenant C Tenant D Tenant E
  7. Many possible approaches: - Many tenants => One index Tenants

    A, B, C Multitenancy
  8. Many possible approaches: - One tenant => One shard Multitenancy

    Tenant A Tenant B Tenant C
  9. Multitenancy: Tenant Sharding - Routing - Hash Function - Aliases

    PUT someIndex/someType/1?routing=tenant hash(routing) % N == 5 S0 S4 S1 S5 S2 S6 S3 S7 cluster.routing.operation.hash.type: atoi POST /_aliases { "actions" : [ { "add" : { "index" : "physicalIndex1", "alias" : "someTenantName", "routing" : "5" } } ] }
  10. Encryption Many approaches: 1. Filesystem-level 2. Codec (LUCENE-6966) 3. Directory

    (LUCENE-2228) 4. Application-level (homomorphic encryption) Filesystem Your App Elasticsearch Lucene - Codec Lucene - Directory
  11. Encryption

  12. Encryption Magic Tenant ID AES-CBC.. continues for 8k page.. Ciphertext..

    KV Nonce KV HMAC IV .. padding
  13. Encryption: Performance

  14. Encryption: Correctness Kendall tau distance

  15. Encryption: Key Fetching - RPC call - might well fail

    - Partial failures - Misconfiguration - First: Lose no data. - Second: Keep the cluster running. - Third: Recover if possible. - Last: Give up.
  16. Encryption: Peer Recovery Idx A, Shard 7, Primary Idx A,

    Shard 7, Replica
  17. Encryption: Peer Recovery Idx A, Shard 7, Primary Idx A,

    Shard 7, Replica
  18. Encryption: Peer Recovery Idx A, Shard 7, Primary Idx A,

    Shard 7, Primary
  19. Encryption: Peer Recovery Idx A, Shard 7, Re-initializing Idx A,

    Shard 7, Primary
  20. Encryption: Peer Recovery Idx A, Shard 7, Replica Idx A,

    Shard 7, Primary
  21. Escalar + + = https://github.com/Workday/escalar

  22. Escalar: API - Create a client import com.workday.esclient._ val esUrl

    = "http://localhost:9200" val client = EsClient.createEsClient(esUrl) - Create an index client.createIndex(indexName) - Index a doc client.index(indexName, typeName, id, doc) - Get a doc val getDoc = client.get(indexName, id)
  23. Escalar: Query DSL filtered( multiMatch(Seq(“firstName”, “lastName”), “kim”), bool( must =

    Seq(terms(“owners”, Seq(“id1”), terms()), should = users.map(u => terms(“field1”, Seq(u))) ) ) https://github.com/Workday/escalar/blob/master/src/main/scala/com/workday/esclient/EsQueryHelpers.scala
  24. Iterable: Enterprise at Scale

  25. Iterable: Enterprise at Scale query: { "query" : { "filtered"

    : { "filter" : { "has_child" : { "filter" : { "range" : { "createdAt" : { "from" : null, "to" : "1492473419313" } } }, "child_type" : "emailOpen", "min_children" : 5, "max_children" : 10 } } } } } user trackPurchase emailClick customEvent emailOpen preferences
  26. Thanks! Iterable is Hiring