Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a streaming database service

Building a streaming database service

Streaming Queries with ElasticSearch

Avatar for Siddharth Kothari

Siddharth Kothari

May 21, 2015
Tweet

More Decks by Siddharth Kothari

Other Decks in Technology

Transcript

  1. The title could also be: * Streaming Queries with ElasticSearch


    * Streaming DBs will take over the world
 * What the heck is a streaming database?
  2. Siddarth, sith in making Co-founder and CEO, appbase.io Give me

    a shout at @siddharthlatest GET /programming/stacktrace 
 [“Games”, “C”, “Python”, “Java (sigh) ”, “AI”, “JS”, “Databases”]
  3. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  4. 1. Streams and Firehoses from #IoT 2. Monitoring Systems 3.

    Analytics 4. E-commerce: Search, Price Monitoring Use-cases
  5. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  6. Elasticsearch Distributed Full-text Search based on Lucene Can scale to

    many nodes and highly available Analytics, Document Oriented, Open Source
  7. ES: Percolation aka Search in Reverse 1. Indexing a Query

    2. Matches when new documents are added 3. Distributed design since v1.0.0
  8. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  9. Streaming Topology • Queries are subscriptions (HTTP Streaming / Websockets)

    • Publish matches to subscribers. • Works as is with the ES API.
  10. Streaming Topology • Beyond Percolation, keep the document store model

    of ES. • Every document is a topic, which can have references. • When a doc is created, updated, or deleted; notify all the
 docs that refer to it.
  11. Streaming Workflows • Every document has an ES Path exposed

    by REST. Endpoint Worker Push back
 to stream • Topology like Apache Storm, but you can notify the workers using the entire ElasticSearch API.
  12. Topics: 1. What is a streaming database? 2. The use-cases

    3. ElasticSearch as the query layer 4. Streaming Topology 5. How does it scale? 6. The future
  13. How does it distribute, scale? • Underlying Store: ES is

    highly available, can scale to many nodes.
 • Were able to ingest 100,000 documents per second on 20 
 C4.2x large nodes (AWS).
 • Eventually consistent, with a very small t.
 • Distributed Streaming Topology is a work in progress.
  14. Looking forward “The web has moved to #realtime, why shouldn’t

    the Backend Infrastructure too?” “DBs are moving to having RESTful APIs, 
 percolators, streaming interfaces are the next steps”.