Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a streaming database service

Building a streaming database service

Streaming Queries with ElasticSearch

Siddharth Kothari

May 21, 2015
Tweet

More Decks by Siddharth Kothari

Other Decks in Technology

Transcript

  1. The title could also be: * Streaming Queries with ElasticSearch


    * Streaming DBs will take over the world
 * What the heck is a streaming database?
  2. Siddarth, sith in making Co-founder and CEO, appbase.io Give me

    a shout at @siddharthlatest GET /programming/stacktrace 
 [“Games”, “C”, “Python”, “Java (sigh) ”, “AI”, “JS”, “Databases”]
  3. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  4. 1. Streams and Firehoses from #IoT 2. Monitoring Systems 3.

    Analytics 4. E-commerce: Search, Price Monitoring Use-cases
  5. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  6. Elasticsearch Distributed Full-text Search based on Lucene Can scale to

    many nodes and highly available Analytics, Document Oriented, Open Source
  7. ES: Percolation aka Search in Reverse 1. Indexing a Query

    2. Matches when new documents are added 3. Distributed design since v1.0.0
  8. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  9. Streaming Topology • Queries are subscriptions (HTTP Streaming / Websockets)

    • Publish matches to subscribers. • Works as is with the ES API.
  10. Streaming Topology • Beyond Percolation, keep the document store model

    of ES. • Every document is a topic, which can have references. • When a doc is created, updated, or deleted; notify all the
 docs that refer to it.
  11. Streaming Workflows • Every document has an ES Path exposed

    by REST. Endpoint Worker Push back
 to stream • Topology like Apache Storm, but you can notify the workers using the entire ElasticSearch API.
  12. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  13. How does it distribute, scale? • Underlying Store: ES is

    highly available, can scale to many nodes.
 • Were able to ingest 100,000 documents per second on 20 
 C4.2x large nodes (AWS).
 • Eventually consistent, with a very small t.
 • Distributed Streaming Topology is a work in progress.
  14. Looking forward “The web has moved to #realtime, why shouldn’t

    the Backend Infrastructure too?” “DBs are moving to having RESTful APIs, 
 percolators, streaming interfaces are the next steps”.