Upgrade to Pro — share decks privately, control downloads, hide ads and more …

To infinity and beyond

To infinity and beyond

Elastic::Model is a new framework to store your Moose objects, which uses ElasticSearch as a NoSQL document store and flexible search engine.

It is designed to make small beginnings simple, but to scale easily to Big Data requirements without needing to rearchitect your application. No job too big or small!

This talk will introduce Elastic::Model, demonstrate how to develop a simple application, introduce some more advanced techniques, and discuss how it uses ElasticSearch to scale.

https://github.com/clintongormley/Elastic-Model

D0dd23d18388ba0225bbb9bcba7ede83?s=128

Clinton Gormley

August 21, 2012
Tweet

More Decks by Clinton Gormley

Other Decks in Programming

Transcript

  1. To infinity and beyond! A practical guide for Mooseherds (and

    other carers of livestock) @clintongormley #elasticsearch YAPC::EU 2012
  2. I have an idea for a killer app!

  3. Quick! Lets...

  4. Design our objects

  5. Flatten them into tables

  6. Normalize data

  7. Add indexes

  8. Add tables for many-to-one

  9. More indexes

  10. Need full text search?

  11. Copy data to search engine

  12. Keep the two in sync

  13. Get search results, pull objects from DB

  14. Success! Success!

  15. Need to scale

  16. Buy a bigger box

  17. Tune indexes

  18. Add caching

  19. Fix caching bugs

  20. Master - Slave replication

  21. Buy SSDs

  22. Denormalize data

  23. Buy bigger boxes

  24. Shard your data (ie rewrite your application)

  25. Do you really need a relational DB?

  26. Do you really need a relational DB? faster horse?

  27. NoSQL advantages

  28. Document oriented

  29. ...just store your object

  30. Fast reads and writes

  31. Scale horizontally

  32. Recover from failure

  33. But...

  34. Different from RDBM

  35. No transactions

  36. No joins

  37. Denormalized data

  38. Still need to add: indexes

  39. Still need to add: full text search

  40. elasticsearch

  41. Real time document store

  42. Powerful full text search (Near real time: < 1 second)

  43. Filters, geolocation...

  44. Distributed by design

  45. Fault tolerant

  46. Easy sharding

  47. Start small Scale massively

  48. Why keep two datastores in sync?

  49. Just use elasticsearch

  50. with Elastic::Model

  51. Store and query Moose objects

  52. Exposes full power of elasticsearch

  53. and takes care of the housekeeping

  54. How?

  55. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } );
  56. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  57. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  58. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  59. package MyApp::Post; use Moose; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Moose; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  60. package MyApp::Post; use Elastic::Doc; has 'title' => ( is =>

    'rw', isa => 'Str' ); has 'content' => ( is => 'rw', isa => 'Str' ); has 'created' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); has 'user' => ( is => 'ro', isa => 'MyApp::User', ); package MyApp::User; use Elastic::Doc; has 'name' => ( is => 'rw', isa => 'Str' ); has 'email' => ( is => 'rw', isa => 'Str', required => 1 );
  61. Some definitions... * index * type * doc * alias

    Like a database Like a table Like a row in a table Like a symbolic link, points to one or more indices elasticsearch Elastic::Model * domain * namespace * model An index or an alias, used for CRUD Maps type <=> class for all associated domains Connects your app to elasticsearch.
  62. We need a Model

  63. package MyApp; use Elastic::Model;

  64. package MyApp; use Elastic::Model; has_namespace 'myapp' => { };

  65. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', post => 'MyApp::Post, };
  66. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', post => 'MyApp::Post, }; # like table <=> class
  67. Using our Model

  68. use MyApp;

  69. use MyApp; my $model = MyApp->new;

  70. use MyApp; my $model = MyApp->new; my $namespace = $model->namespace('myapp');

    # For index and alias management my $domain = $model->domain('myapp'); # For document CRUD my $view = $model->view; # For searching To do anything useful, we need:
  71. Namespace: Create an index $namespace->index->create; my $namespace = $model->namespace('myapp'); *

    create index 'myapp' * namespace:myapp => index:myapp
  72. Namespace: Delete an index my $namespace = $model->namespace('myapp'); $namespace->index->delete;

  73. Namespace: Create an alias my $namespace = $model->namespace('myapp'); $namespace->index('myapp_v1')->create; $namespace->alias->to('myapp_v1');

    * alias:myapp => index:myapp_v1 * namespace:myapp => alias:myapp => index:myapp_v1
  74. Domain: Create a user my $domain = $model->domain('myapp'); my $user

    = $domain->new_doc( user => { name => 'Clinton', email => 'clint@foo.com', } ); $user->save;
  75. Domain: Create a user my $domain = $model->domain('myapp'); my $user

    = $domain->create( user => { name => 'Clinton', email => 'clint@foo.com', } ); $user->save;
  76. Domain: Create a user my $domain = $model->domain('myapp'); my $user

    = $domain->create( user => { name => 'Clinton', email => 'clint@foo.com', id => 1, } ); say $user->id; # 1 say $user->type; # user
  77. Domain: Create a post my $domain = $model->domain('myapp'); my $post

    = $domain->create( post => { id => 2, title => 'To infinity and beyond', content => 'Elastic::Model persists Moose ' . . 'objects in elasticsearch', user => $user } );
  78. Domain: Retrieve a doc my $domain = $model->domain('myapp'); my $post

    = $domain->get( post => 2 ); my $user = $post->user; # stub object say $user->name; # full object # Clinton say $user->id; # still stub # 1
  79. Domain: Update a doc my $domain = $model->domain('myapp'); $post->title('Awesome blog

    post'); say $post->has_changed; # 1 say $post->has_changed('title'); # 1 say $post->old_value('title'); # To infinity and beyond $post->save;
  80. optimistic version control

  81. $version++ on every change

  82. 1: $post = $domain->get(post=>2); 2: $post = $domain->get(post=>2); 1: $post->title('Awesome

    blog post'); 2: $post->title('Brilliant blog post'); 1: $post->save; 2: $post->save; *** CONFLICT ERROR ***
  83. Dealing with conflicts

  84. Ignore them $post->overwrite;

  85. on_conflict handler

  86. $post->save( on_conflict => sub { my ($old,$new) = @_; #

    do something # to resolve conflict });
  87. $post->save( on_conflict => sub { my ($old,$new) = @_; my

    %changed = $old->old_values; $new->$_( $changed->{$_} ) for keys %changed; $new->save; $post = $new; });
  88. Query docs: View $results = $model->view->search;

  89. Views are reusable $posts = $model->view( type => 'post' );

    $featured = $posts->filterb( featured => 1 );
  90. Single domain $view = $domain->view;

  91. Multi domain $view = $model->view;

  92. Multi domain $view = $model->view; $view = $model->view->domain('foo','bar');

  93. Multi type $view = $model->view; $view = $model->view->type('user','post');

  94. my $view = $domain ->view ->type( 'post') ->filterb( created =>

    { gte => '2012-08-01' }, user => $user, ) ->queryb( title => 'awesome' ) ->sort( 'timestamp' ) ->size( 20 ) ->highlight( 'content' ) ->explain( 1 ); See "Terms of Endearment" on speakerdeck.com
  95. First result $results = $view->first

  96. $size results $results = $view->search;

  97. Unbounded results $results = $view->scroll $results = $view->scan

  98. Results are iterators $result = $results->next $result = $results->prev $result

    = $results->first $result = $results->last $result = $results->shift
  99. Result is: metadata + object say $result->object->title

  100. my $results = $view->search; say "Total hits: " . $results->total;

    say "Took: " . $results->took . "ms"; while ( my $result = $results->next ) { say "Title:" . $result->object->title; say "Snippets:" . join "\n", $result->highlight('content'); say "Score:" . $result->score; say "Debug:" . $result->explain; }
  101. Just the object $object = $results->next_object

  102. Just objects $results->as_objects; $object = $results->next;

  103. Enough dull API!

  104. Not just a doc store

  105. *** POWERFUL *** search engine

  106. BUT...

  107. You can only get out what you put in

  108. Prepare your data

  109. Tell elasticsearch: * what fields you have * what data

    they contain * how to index them
  110. "Mapping" (like a database schema)

  111. Moose gives us introspection (takes the pain away)

  112. Examples: analyzed full text has 'name' => ( is =>

    'rw', isa => 'Str', ); name: { type: "string" }
  113. Examples: analyze and stem text has 'name' => ( is

    => 'rw', isa => 'Str', analyzer => 'english' ); name: { type: "string", analyzer: "english" }
  114. Examples: analyze and stem text has 'name' => ( is

    => 'rw', isa => 'Str', analyzer => 'norwegian' ); name: { type: "string", analyzer: "norwegian" }
  115. Examples: store the exact value has 'tag' => ( is

    => 'rw', isa => 'Str', index => 'not_analyzed' ); tag: { type: "string", index: "not_analyzed" }
  116. Examples: complex data use MooseX::Types::Moose qw(Str); use MooseX::Types::Structured qw(Dict); has

    'name' => ( is => 'rw', isa => Dict[ first => Str, last => Str, middle => Optional[Str], ], ); name: { type: "object", properties: { first: { type: 'string' }, last: { type: 'string' }, middle: { type: 'string'} } }
  117. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  118. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } } Denormalised data!
  119. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', exclude_attrs => ['email'] ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  120. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', include_attrs => ['email'] ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  121. Examples: Elastic::Doc classes has 'user' => ( is => 'rw',

    isa => 'MyApp::User', include_attrs => [] ); user: { type: "object", properties: { name: { type: 'string' }, email: { type: 'string' }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  122. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', } title: { type: "string" } title => 'An AMAZING talk!' title: ['amazing','talk'] What do you sort on? 'amazing' or 'talk'
  123. Multi-fields index the same data in different ways

  124. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', }
  125. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' } } }
  126. Same data. Different purpose has 'title' => ( is =>

    'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' } } } title => 'An AMAZING talk!' title: { title: ['amazing','talk'], untouched: "An AMAZING talk!" }
  127. Let's TWEAK stuff!

  128. How about AUTO-COMPLETE?

  129. Don't use wildcards Slow & inefficient

  130. Prepare your data: "Analysis"

  131. With edge-ngrams

  132. Analysis process "Édith Piaf" -> standard tokenizer -> ["Édith", "Piaf"]

    -> lowercase token filter -> ["édith", "piaf"] -> ascii-folding token filter -> ["edith", "piaf"] -> edge-ngrams token filter -> ["e", "ed", "edi", "edit", "edith", "p", "pi", "pia", "piaf"] Perfect for partial matching!
  133. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; Add a custom analyzer to our Model
  134. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; Add a custom analyzer to our Model
  135. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; has_filter 'my_edge_ngrams' => { type => 'edge_ngrams', min_gram => 1, max_gram => 15 }; Add a custom analyzer to our Model
  136. package MyApp; use Elastic::Model; has_namespace 'myapp' => { user =>

    'MyApp::User', type => 'MyApp::Post, }; has_filter 'my_edge_ngrams' => { type => 'edge_ngrams', min_gram => 1, max_gram => 15 }; has_analyzer 'autocomplete' => { tokenizer => 'standard', filter => ['lowercase','asciifolding', 'my_edge_ngrams'] }; Add a custom analyzer to our Model
  137. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' } } }
  138. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' }, autocomplete => { analyzer => 'autocomplete' } } }
  139. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' }, autocomplete => { analyzer => 'autocomplete' } } } title => 'An AMAZING talk!' title: { title: ['amazing','talk'], untouched: "An AMAZING talk!" }
  140. Add analyzer to our Doc class has 'title' => (

    is => 'rw', isa => 'Str', multi => { untouched => { index => 'not_analyzed' }, autocomplete => { analyzer => 'autocomplete' } } } title => 'An AMAZING talk!' title: { title: ['amazing','talk'], untouched: "An AMAZING talk!", autocomplete: [ 'a', 'am', 'ama', 'amaz', 'amazi', 'amazin', 'amazing', 't', 'ta', 'tal', 'talk' ] }
  141. Apply your changes

  142. Update the mapping AND the data

  143. Reindex

  144. $new = $namespace->index('myapp_v2'); $new->reindex('myapp'); $namespace->alias->to('myapp_v2'); $namespace->index('myapp_v1')->delete;

  145. Autocomplete query

  146. $view = $domain->view->queryb( );

  147. $view = $domain->view->queryb( "title.autocomplete" => "amazing ta", );

  148. $view = $domain->view->queryb( "title.autocomplete" => "amazing ta", ); Matches anything

    starting with 'a' or 't' BOOH!
  149. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query

    => "amazing ta", } } );
  150. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query

    => "amazing ta", operator => "or" } } ); "a OR am OR ama OR amaz OR ... OR t OR ta"
  151. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query

    => "amazing ta", operator => "and" } } );
  152. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query

    => "amazing ta", operator => "and" } } ); Complete words should be more relevant
  153. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query

    => "amazing ta", operator => "and" } }, "title" => "amazing ta", );
  154. $view = $domain->view->queryb([ "title.autocomplete" => { -text => { query

    => "amazing ta", operator => "and" } }, "title" => "amazing ta", ]);
  155. Done!

  156. Scaling

  157. To infinity and beyond!

  158. Basic unit of scale: the shard

  159. An index has 1-or-more primary shards

  160. Each primary has 0-or-more replica shards

  161. Primaries scale total data

  162. Replicas are for failover and to scale queries

  163. Default: 5 primary shards with 1 replica each

  164. 5 * (1 + 1) = 10 shards

  165. 10 shards = 1 .. 10 servers

  166. Can change number of replicas

  167. CANNOT change number of primaries

  168. So how do we scale?

  169. Kagillion shards!

  170. Umm, No.

  171. Be a grower not a shower

  172. At query time:

  173. 1 index x 10 shards == 10 indices x 1

    shard
  174. Two patterns:

  175. Time based indices Index-per-user

  176. Time based indices Index-per-user

  177. * one index per month * write to alias: logs_current

    * query alias: logs
  178. $ns = $model->namespace('logs'); $ns->index('logs_2012_08')->create; $ns->alias('logs_current')->to('logs_2012_08'); $ns->alias->to('logs_2012_08'); $model->domain('logs_current')->create( log => \%data

    ); $model->domain('logs')->view->search;
  179. New month, new index $ns->index('logs_2012_09')->create; $ns->alias('logs_current')->to('logs_2012_09'); $ns->alias->add('logs_2012_09');

  180. Add alias for 2012 $ns->alias('logs_2012')->to( 'logs_2012_08', 'logs_2012_09', ... );

  181. Time based indices Index-per-user

  182. Users have their own data

  183. Most searches are per-user

  184. Ideal: Index-per-user

  185. Expensive

  186. Most users have little data

  187. Some have LOTS!

  188. Start with one index for all users

  189. Use aliases to pretend

  190. ...aliases with... filters and routing

  191. $ns->alias( 'bloggs_plumbers' )->to( myapp_v1 => { filterb => { client_id

    => 'bloggs_plumbers' }, routing => 'bloggs_plumbers' } );
  192. Routing determines: which shard stores your data

  193. Routing == bloggs_plumbers All user's data on same shard

  194. CRUD -> hit one shard Queries -> hit one shard

  195. SUPER efficient!

  196. New client joins...

  197. ...called "Twitter"

  198. 6 months later...

  199. $new = $ns->index('twitter_v1'); $new->reindex('twitter'); $ns->alias('twitter')->to('twitter_v1'); $ns->alias->add('twitter_v1');

  200. What more do you need?

  201. Go forth and HERD!