Upgrade to Pro — share decks privately, control downloads, hide ads and more …

To infinity and beyond

To infinity and beyond

Elastic::Model is a new framework to store your Moose objects, which uses ElasticSearch as a NoSQL document store and flexible search engine.

It is designed to make small beginnings simple, but to scale easily to Big Data requirements without needing to rearchitect your application. No job too big or small!

This talk will introduce Elastic::Model, demonstrate how to develop a simple application, introduce some more advanced techniques, and discuss how it uses ElasticSearch to scale.

https://github.com/clintongormley/Elastic-Model

Clinton Gormley

July 25, 2013
Tweet

More Decks by Clinton Gormley

Other Decks in Programming

Transcript

  1. To infinity and beyond! A practical guide for Mooseherds (and

    other carers of livestock) @clintongormley #elasticsearch OSCON 2013
  2. I have an idea for a killer app!

  3. Quick! Lets...

  4. Design our objects

  5. Flatten them into tables

  6. Normalize data

  7. Add indexes

  8. Add tables for many-to-one

  9. More indexes

  10. Need full text search?

  11. Copy data to search engine

  12. Keep the two in sync

  13. Get search results, pull objects from DB

  14. Success! Success!

  15. None
  16. Need to scale

  17. Buy a bigger box

  18. Tune indexes

  19. Add caching

  20. Fix caching bugs

  21. Master - Slave replication

  22. Buy SSDs

  23. Denormalize data

  24. Buy bigger boxes

  25. Shard your data (ie rewrite your application)

  26. Do you really need a relational DB?

  27. Do you really need a relational DB? faster horse?

  28. NoSQL advantages

  29. Document oriented

  30. ...just store your object

  31. Fast reads and writes

  32. Scale horizontally

  33. Recover from failure

  34. But...

  35. Different from RDBM

  36. No transactions

  37. No joins

  38. Denormalized data

  39. Still need to add: indexes

  40. Still need to add: full text search

  41. elasticsearch

  42. Real time document store

  43. Powerful full text search

  44. Filters, geolocation...

  45. Distributed by design

  46. Fault tolerant

  47. Easy sharding

  48. Start small Scale massively

  49. Why keep two datastores in sync?

  50. Just use elasticsearch

  51. with Elastic::Model

  52. Store and query Moose objects

  53. Exposes full power of elasticsearch

  54. and takes care of the housekeeping

  55. How?

  56. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } );
  57. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  58. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  59. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  60. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  61. package MyApp::Post; use Elastic::Doc; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Elastic::Doc; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  62. Some definitions... * index * type * doc * alias

    Like a database Like a table Like a row in a table Like a symbolic link, points to one or more indices elasticsearch Elastic::Model * domain * namespace * model An index or an alias, used for CRUD Maps type <=> class for all associated domains Connects your app to elasticsearch.
  63. We need a Model

  64. package MyApp; use Elastic::Model;

  65. package MyApp; use Elastic::Model; has_namespace 'myapp' => { };

  66. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', post => 'MyApp::Post, };
  67. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', post => 'MyApp::Post, }; # like table <=> class
  68. Using our Model

  69. use MyApp;

  70. use MyApp; my $model = MyApp->new;

  71. use MyApp; my $model = MyApp->new; my $namespace = $model->namespace('myapp');

    # For index and alias management my $domain = $model->domain('myapp'); # For document CRUD my $view = $model->view; # For searching To do anything useful, we need:
  72. Namespace: Create an index $namespace->index->create; my $namespace = $model->namespace('myapp'); *

    create index 'myapp' * namespace:myapp => index:myapp
  73. Namespace: Delete an index my $namespace = $model->namespace('myapp'); $namespace->index->delete;

  74. Namespace: Create an alias my $namespace = $model->namespace('myapp'); $namespace->index('myapp_v1')->create; $namespace->alias->to('myapp_v1');

    * alias:myapp => index:myapp_v1 * namespace:myapp => alias:myapp => index:myapp_v1
  75. Domain: Create a user my $domain = $model->domain('myapp'); my $user

    = $domain->new_doc( user => { name => 'Clinton', email => 'clint@foo.com', } ); $user->save;
  76. Domain: Create a user my $domain = $model->domain('myapp'); my $user

    = $domain->create( user => { name => 'Clinton', email => 'clint@foo.com', } ); $user->save;
  77. Domain: Create a user my $domain = $model->domain('myapp'); my $user

    = $domain->create( user => { name => 'Clinton', email => 'clint@foo.com', id => 1, } ); say $user->id; # 1 say $user->type; # user
  78. Domain: Create a post my $domain = $model->domain('myapp'); my $post

    = $domain->create( post => { id => 2, title => 'To infinity and beyond', content => 'Elastic::Model persists Moose ' . . 'objects in elasticsearch', user => $user } );
  79. Domain: Retrieve a doc my $domain = $model->domain('myapp'); my $post

    = $domain->get( post => 2 ); my $user = $post->user; # stub object say $user->name; # full object # Clinton say $user->id; # still stub # 1
  80. Domain: Update a doc my $domain = $model->domain('myapp'); $post->title('Awesome blog

    post'); say $post->has_changed; # 1 say $post->has_changed('title'); # 1 say $post->old_value('title'); # To infinity and beyond $post->save;
  81. optimistic version control

  82. $version++ on every change

  83. 1: $post = $domain->get(post=>2); 2: $post = $domain->get(post=>2); 1: $post->title('Awesome

    blog post'); 2: $post->title('Brilliant blog post'); 1: $post->save; 2: $post->save; *** CONFLICT ERROR ***
  84. Dealing with conflicts

  85. Ignore them $post->overwrite;

  86. on_conflict handler

  87. $post->save( on_conflict => sub { my ($old,$new) = @_; #

    do something # to resolve conflict });
  88. $post->save( on_conflict => sub { my ($old,$new) = @_; my

    %changed = $old->old_values; $new->$_( $changed->{$_} ) for keys %changed; $new->save; $post = $new; });
  89. Query docs: View $results = $model->view->search;

  90. Views are reusable $posts = $model->view( type => 'post' );

    $featured = $posts->filterb( featured => 1 );
  91. Single domain $view = $domain->view;

  92. Multi domain $view = $model->view;

  93. Multi domain $view = $model->view; $view = $model->view->domain('foo','bar');

  94. Multi type $view = $model->view; $view = $model->view->type('user','post');

  95. my $view = $domain ->view ->type( 'post') ->filterb( created =>

    { gte => '2013-07-01' }, user => $user, ) ->queryb( title => 'awesome' ) ->sort( 'timestamp' ) ->size( 20 ) ->highlight( 'content' ) ->explain( 1 ); See "Terms of Endearment" on speakerdeck.com
  96. First result $results = $view->first

  97. $size results $results = $view->search;

  98. Unbounded results $results = $view->scroll $results = $view->scan

  99. Results are iterators $result = $results->next $result = $results->prev $result

    = $results->first $result = $results->last $result = $results->shift
  100. Result is: metadata + object say $result->object->title

  101. my $results = $view->search; say "Total hits: " . $results->total;

    say "Took: " . $results->took . "ms"; while ( my $result = $results->next ) { say "Title:" . $result->object->title; say "Snippets:" . join "\n", $result->highlight('content'); say "Score:" . $result->score; say "Debug:" . $result->explain; }
  102. Just the object $object = $results->next_object

  103. Just objects $results->as_objects; $object = $results->next;

  104. Enough dull API!

  105. Not just a doc store

  106. *** POWERFUL *** search engine

  107. BUT...

  108. You can only get out what you put in

  109. Prepare your data

  110. Tell elasticsearch: * what fields you have * what data

    they contain * how to index them
  111. "Mapping" (like a database schema)

  112. Moose gives us introspection (takes the pain away)

  113. Examples: analyzed full text has 'title' => ( is =>

    'rw', isa => 'Str', ); title: { type: "string" }
  114. Examples: analyze and stem text has 'title' => ( is

    => 'rw', isa => 'Str', analyzer => 'english' ); title: { type: "string", analyzer: "english" }
  115. Examples: analyze and stem text has 'title' => ( is

    => 'rw', isa => 'Str', analyzer => 'norwegian' ); title: { type: "string", analyzer: "norwegian" }
  116. Examples: store the exact value has 'tag' => ( is

    => 'rw', isa => 'Str', index => 'not_analyzed' ); tag: { type: "string", index: "not_analyzed" }
  117. Examples: complex data use MooseX::Types::Moose qw(Str); use MooseX::Types::Structured qw(Dict); has

    'name' => ( is => 'rw', isa => Dict[ first => Str, last => Str, middle => Optional[Str], ], ); name: { type: "object", properties: { first: { type: 'string' }, last: { type: 'string' }, middle: { type: 'string'} } }
  118. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  119. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } } Denormalised data!
  120. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', exclude_attrs => ['email'] ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  121. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', include_attrs => ['email'] ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  122. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', include_attrs => [] ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  123. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', } title: { type: "string" } title => 'An AMAZING talk!' title: ['amazing','talk'] What do you sort on? 'amazing' or 'talk'
  124. Multi-fields index the same data in different ways

  125. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', }
  126. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' } } }
  127. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' } } } title => 'An AMAZING talk!' title: { title: ['amazing','talk'], untouched: "An AMAZING talk!" }
  128. Let's TWEAK stuff!

  129. How about AUTO-COMPLETE?

  130. Don't use wildcards Slow & inefficient

  131. Prepare your data: "Analysis"

  132. With edge-ngrams

  133. Analysis process "Édith Piaf" -> standard tokenizer -> ["Édith", "Piaf"]

    -> lowercase token filter -> ["édith", "piaf"] -> ascii-folding token filter -> ["edith", "piaf"] -> edge-ngrams token filter -> ["e", "ed", "edi", "edit", "edith", "p", "pi", "pia", "piaf"] Perfect for partial matching!
  134. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; Add a custom analyzer to our Model
  135. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; Add a custom analyzer to our Model
  136. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; has_filter 'my_edge_ngrams' => { type => 'edge_ngrams', min_gram => 1, max_gram => 15 }; Add a custom analyzer to our Model
  137. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; has_filter 'my_edge_ngrams' => { type => 'edge_ngrams', min_gram => 1, max_gram => 15 }; has_analyzer 'autocomplete' => { tokenizer => 'standard', filter => ['lowercase','asciifolding', 'my_edge_ngrams'] }; Add a custom analyzer to our Model
  138. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' } } }
  139. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' }, autocomplete => { analyzer => 'autocomplete' } } }
  140. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' }, autocomplete => { analyzer => 'autocomplete' } } } title => 'An AMAZING talk!' title: { title: ['amazing','talk'], untouched: "An AMAZING talk!" }
  141. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' }, autocomplete => { analyzer => 'autocomplete' } } } title => 'An AMAZING talk!' title: { title: ['amazing','talk'], untouched: "An AMAZING talk!", autocomplete: [ 'a', 'am', 'ama', 'amaz', 'amazi', 'amazin', 'amazing', 't', 'ta', 'tal', 'talk' ] }
  142. Apply your changes

  143. Update the mapping AND the data

  144. Reindex

  145. $new = $namespace->index('myapp_v2'); $new->reindex('myapp'); $namespace->alias->to('myapp_v2'); $namespace->index('myapp_v1')->delete;

  146. Autocomplete query

  147. $view = $domain->view->queryb( );

  148. $view = $domain->view->queryb( "title.autocomplete" => "amazing ta", );

  149. $view = $domain->view->queryb( "title.autocomplete" => "amazing ta", ); Matches anything

    starting with 'a' or 't' BOOH!
  150. $view = $domain->view->queryb( "title.autocomplete" => { -match => { query

    => "amazing ta", } } );
  151. $view = $domain->view->queryb( "title.autocomplete" => { -match => { query

    => "amazing ta", operator => "or" } } ); "a OR am OR ama OR amaz OR ... OR t OR ta"
  152. $view = $domain->view->queryb( "title.autocomplete" => { -match => { query

    => "amazing ta", operator => "and" } } );
  153. $view = $domain->view->queryb( "title.autocomplete" => { -match => { query

    => "amazing ta", operator => "and" } } ); Complete words should be more relevant
  154. $view = $domain->view->queryb( "title.autocomplete" => { -match => { query

    => "amazing ta", operator => "and" } }, "title" => "amazing ta", );
  155. $view = $domain->view->queryb([ "title.autocomplete" => { -match => { query

    => "amazing ta", operator => "and" } }, "title" => "amazing ta", ]);
  156. Done!

  157. Scaling

  158. To infinity and beyond!

  159. Basic unit of scale: the shard

  160. An index has 1-or-more primary shards

  161. Each primary has 0-or-more replica shards

  162. Primaries scale total data

  163. Replicas are for failover and to scale queries

  164. Default: 5 primary shards with 1 replica each

  165. 5 * (1 + 1) = 10 shards

  166. 10 shards = 1 .. 10 servers

  167. Can change number of replicas

  168. CANNOT change number of primaries

  169. So how do we scale?

  170. Kagillion shards!

  171. Umm, No.

  172. Be a grower not a shower

  173. At query time:

  174. 1 index x 10 shards == 10 indices x 1

    shard
  175. Two patterns:

  176. Time based indices Index-per-user

  177. Time based indices Index-per-user

  178. * one index per month * write to alias: logs_current

    * query alias: logs
  179. $ns = $model->namespace('logs'); $ns->index('logs_2013_08')->create; $ns->alias('logs_current')->to('logs_2013_08'); $ns->alias->to('logs_2013_08'); $model->domain('logs_current')->create( log => \%data

    ); $model->domain('logs')->view->search;
  180. New month, new index $ns->index('logs_2013_09')->create; $ns->alias('logs_current')->to('logs_2013_09'); $ns->alias->add('logs_2013_09');

  181. Add alias for 2013 $ns->alias('logs_2013')->to( 'logs_2013_08', 'logs_2013_09', ... );

  182. Time based indices Index-per-user

  183. Users have their own data

  184. Most searches are per-user

  185. Ideal: Index-per-user

  186. Expensive

  187. Most users have little data

  188. Some have LOTS!

  189. Start with one index for all users

  190. Use aliases to pretend

  191. ...aliases with... filters and routing

  192. $ns->alias( 'bloggs_plumbers' )->to( myapp_v1 => { filterb => { client_id

    => 'bloggs_plumbers' }, routing => 'bloggs_plumbers' } );
  193. Routing determines: which shard stores your data

  194. Routing == bloggs_plumbers All user's data on same shard

  195. CRUD -> hit one shard Queries -> hit one shard

  196. SUPER efficient!

  197. New client joins...

  198. ...called "Twitter"

  199. 6 months later...

  200. $new = $ns->index('twitter_v1'); $new->reindex('twitter'); $ns->alias('twitter')->to('twitter_v1'); $ns->alias->add('twitter_v1');

  201. What more do you need?

  202. Go forth and HERD!