Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch For Big Enterprise SaaS

Elasticsearch For Big Enterprise SaaS

In this talk we'll discuss adapting Elasticsearch for use in a large multi-tenant enterprise SaaS application. We'll talk about an approach for scaling indices to handle thousands of tenants with shard routing that avoids explosions in cluster state size. We'll also discuss extending Elasticsearch to add encryption-at-rest using Lucene and translog encryption. We'll show Workday's recently-released 'escalar' library for working with Elasticsearch from Scala, including the beginnings of a Scala DSL for Elasticsearch's query language. Finally, we'll see how these three problems connect to three core concerns for managing enterprise data at scale: managing complexity; security; safety and quality.

Thomas Kim is soon to be an engineer at Iterable. He has been working in enterprise software and SaaS for over 15 years. He was formerly a tech lead on Workday Search. Prior to that, he was the CTO of a small BI startup and an early engineer at Salesforce. He loves dogs, snowboarding, and statically typed functional programming. Being a bandwagon Warriors fan makes his wife laugh.

https://www.meetup.com/Elasticsearch-San-Francisco/events/238929112/

Elastic Co

April 25, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Three Problems 1. Multi-tenant Elasticsearch 2. Encryption-at-rest 3. Using ES

    from Scala Managing complexity App security Safety & maintainability
  2. Elasticsearch Arch Review Idx A: Shard 1 (R) Idx A:

    Shard 1 (P) Idx A: Shard 2 (P) Idx A: Shard 2 (R) Idx A: Shard 3 (R) Idx A: Shard 3 (P) Idx A: Shard 4 (P) Idx A: Shard 4 (R) Idx B: Shard 1 (P) Idx B: Shard 1 (R) Idx B: Shard 2 (P) Idx B: Shard 2 (R)
  3. Many possible approaches: - One tenant => One index; Multiple

    clusters Multitenancy Tenant A Tenant B Tenant C Tenant C Tenant D Tenant E
  4. Multitenancy: Tenant Sharding - Routing - Hash Function - Aliases

    PUT someIndex/someType/1?routing=tenant hash(routing) % N == 5 S0 S4 S1 S5 S2 S6 S3 S7 cluster.routing.operation.hash.type: atoi POST /_aliases { "actions" : [ { "add" : { "index" : "physicalIndex1", "alias" : "someTenantName", "routing" : "5" } } ] }
  5. Encryption Many approaches: 1. Filesystem-level 2. Codec (LUCENE-6966) 3. Directory

    (LUCENE-2228) 4. Application-level (homomorphic encryption) Filesystem Your App Elasticsearch Lucene - Codec Lucene - Directory
  6. Encryption: Key Fetching - RPC call - might well fail

    - Partial failures - Misconfiguration - First: Lose no data. - Second: Keep the cluster running. - Third: Recover if possible. - Last: Give up.
  7. Escalar: API - Create a client import com.workday.esclient._ val esUrl

    = "http://localhost:9200" val client = EsClient.createEsClient(esUrl) - Create an index client.createIndex(indexName) - Index a doc client.index(indexName, typeName, id, doc) - Get a doc val getDoc = client.get(indexName, id)
  8. Escalar: Query DSL filtered( multiMatch(Seq(“firstName”, “lastName”), “kim”), bool( must =

    Seq(terms(“owners”, Seq(“id1”), terms()), should = users.map(u => terms(“field1”, Seq(u))) ) ) https://github.com/Workday/escalar/blob/master/src/main/scala/com/workday/esclient/EsQueryHelpers.scala
  9. Iterable: Enterprise at Scale query: { "query" : { "filtered"

    : { "filter" : { "has_child" : { "filter" : { "range" : { "createdAt" : { "from" : null, "to" : "1492473419313" } } }, "child_type" : "emailOpen", "min_children" : 5, "max_children" : 10 } } } } } user trackPurchase emailClick customEvent emailOpen preferences