Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Workshop: Learning ElasticSearch

Anurag
July 11, 2013

Workshop: Learning ElasticSearch

Slides from ElasticSearch workshop conducted at The Fifth Elephant 2013, Bangalore.

Anurag

July 11, 2013
Tweet

More Decks by Anurag

Other Decks in Technology

Transcript

  1. Features Real time analytics Distributed High availability Multi tenant architecture

    Full text Document oriented Schema free RESTful API Per-operation persistence
  2. Distributed Start small and scale horizontally out of the box.

    For more capacity, just add more nodes and let the cluster reorganize itself.
  3. Multi Tenancy A cluster can host multiple indices which can

    be queried independently, or as a group. $ c u r l - X P U T h t t p : / / l o c a l h o s t : 9 2 0 0 / p e o p l e $ c u r l - X P U T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s $ c u r l - X P U T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / d o c u m e n t / p r y - 0 . 5 . 9 $ c u r l - X G E T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / d o c u m e n t / p r y - 0 . 5 . 9
  4. Document Oriented Store complex real world entities in Elasticsearch as

    structured JSON documents. { " _ i d " : " p r y - 0 . 5 . 9 " , " _ i n d e x " : " g e m s " , " _ s o u r c e " : { " a u t h o r s " : [ " J o h n M a i r ( b a n i s t e r f i e n d ) " ] , " a u t o r e q u i r e " : n u l l , " b i n d i r " : " b i n " , " c e r t _ c h a i n " : [ ] , " d a t e " : " S u n F e b 2 0 1 1 : 0 0 : 0 0 U T C 2 0 1 1 " , " d e f a u l t _ e x e c u t a b l e " : n u l l , " d e s c r i p t i o n " : " a t t a c h a n i r b - l i k e s e s s i o n t o a n y o b j e c t a t r u n t i m e " , " e m a i l " : " j r m a i r @ g m a i l . c o m " } }
  5. RESTful API Almost any operation can be performed using a

    simple RESTful interface using JSON over HTTP. curl -X GET curl -X PUT curl -X POST curl -X DELETE
  6. Apache Lucene ElasticSearch is built on top of Apache Lucene.

    Lucene is a high performance, full-featured Information Retrieval library, written in Java.
  7. Document $ curl -XGET http://localhost:9200/gems/document/pry-0.5.9 In ElasticSearch, everything is stored

    as a Document. Document can be addressed and retrieved by querying their attributes. { " _ i d " : " p r y - 0 . 5 . 9 " , " _ i n d e x " : " g e m s " , " _ s o u r c e " : { " a u t h o r s " : [ " J o h n M a i r ( b a n i s t e r f i e n d ) " ] , " a u t o r e q u i r e " : n u l l , " b i n d i r " : " b i n " , " c e r t _ c h a i n " : [ ] , " d a t e " : " S u n F e b 2 0 1 1 : 0 0 : 0 0 U T C 2 0 1 1 " , " d e f a u l t _ e x e c u t a b l e " : n u l l , " d e s c r i p t i o n " : " a t t a c h a n i r b - l i k e s e s s i o n t o a n y o b j e c t a t r u n t i m e " , " e m a i l " : " j r m a i r @ g m a i l . c o m " , " e x e c u t a b l e s " : [ " p r y " ] , " e x t e n s i o n s " : [ ] , " e x t r a _ r d o c _ f i l e s " : [ ] , " f i l e s " : [ " l i b / p r y / c o m m a n d s . r b " , " l i b / p r y / c o m m a n d _ b a s e . r b " , " l i b / p r y / c o m p l e t i o n . r b " , " l i b / p r y / c o r e _ e x t e n s i o n s . r b " , " l i b / p r y / h o o k s . r b " , " l i b / p r y / p r i n t . r b " , " l i b / p r y / p r o m p t s . r b " , " l i b / p r y / p r y _ c l a s s . r b " , " l i b / p r y / p r y _ i n s t a n c e . r b " , " l i b / p r y / v e r s i o n . r b " , " l i b / p r y . r b " , " e x a m p l e s / e x a m p l e _ b a s i c . r b " ,
  8. Shard Each Shard is a separate native Lucene Index. Lets

    us overcome RAM limitations, hard disk capacity.
  9. Index ElasticSearch stores its data in logical Indices. Think of

    a table, collection or a database. An Index has atleast 1 primary Shard, and 0 or more Replicas.
  10. Download and start Download ElasticSearch from http://www.elasticsearch.org/download # s e

    r v i c e e l a s t i c s e a r c h s t a r t # / e t c / i n i t . d / e l a s t i c s e a r c h s t a r t # . / b i n / e l a s t i c s e a r c h - f
  11. ElasticSearch Plugins A site plugin to view contents of ElasticSearch

    cluster. Restart ElasticSearch. Plugins are detected and loaded on service startup. # c d / u s r / s h a r e / e l a s t i c s e a r c h # . / b i n / p l u g i n - i n s t a l l m o b z / e l a s t i c s e a r c h - h e a d # c d / o p t / e l a s t i c s e a r c h - 0 . 9 0 . 2 # . / b i n / p l u g i n - i n s t a l l m o b z / e l a s t i c s e a r c h - h e a d
  12. RESTful interface $ c u r l - X G

    E T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / ' { " o k " : t r u e , " s t a t u s " : 2 0 0 , " n a m e " : " D r a k e , F r a n k " , " v e r s i o n " : { " n u m b e r " : " 0 . 9 0 . 2 " , " s n a p s h o t _ b u i l d " : f a l s e , " l u c e n e _ v e r s i o n " : " 4 . 3 . 1 " } , " t a g l i n e " : " Y o u K n o w , f o r S e a r c h " }
  13. Create Index $ c u r l - X P

    U T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s ' { " o k " : t r u e , " a c k n o w l e d g e d " : t r u e }
  14. Cluster status $ c u r l - X G

    E T ' l o c a l h o s t : 9 2 0 0 / _ s t a t u s ' { " o k " : t r u e , " _ s h a r d s " : { " t o t a l " : 2 0 , " s u c c e s s f u l " : 1 0 , " f a i l e d " : 0 } , " i n d i c e s " : { " g e m s " : { " i n d e x " : { " p r i m a r y _ s i z e " : " 4 9 5 b " , " p r i m a r y _ s i z e _ i n _ b y t e s " : 4 9 5 , " s i z e " : " 4 9 5 b " , " s i z e _ i n _ b y t e s " : 4 9 5 } , " t r a n s l o g " : { " o p e r a t i o n s " : 0 } , " d o c s " : { " n u m _ d o c s " : 0 , " m a x _ d o c " : 0 , " d e l e t e d _ d o c s " : 0 } , " m e r g e s " : { " c u r r e n t " : 0 , " c u r r e n t _ d o c s " : 0 , " c u r r e n t _ s i z e " : " 0 b " , " c u r r e n t _ s i z e _ i n _ b y t e s " : 0 , " t o t a l " : 0 , " t o t a l _ t i m e " : " 0 s " , " t o t a l _ t i m e _ i n _ m i l l i s " : 0 , " t o t a l _ d o c s " : 0 , " t o t a l _ s i z e " : " 0 b " , " t o t a l _ s i z e _ i n _ b y t e s " : 0 } , . . . . . . . . .
  15. Pretty Output $ c u r l - X G

    E T ' l o c a l h o s t : 9 2 0 0 / _ s t a t u s ? p r e t t y ' $ c u r l - X G E T ' l o c a l h o s t : 9 2 0 0 / _ s t a t u s ' | p y t h o n - m j s o n . t o o l $ c u r l - X G E T ' l o c a l h o s t : 9 2 0 0 / _ s t a t u s ' | j s o n _ r e f o r m a t { " o k " : t r u e , " _ s h a r d s " : { " t o t a l " : 2 0 , " s u c c e s s f u l " : 1 0 , " f a i l e d " : 0 } , " i n d i c e s " : { " g e m s " : { " i n d e x " : { " p r i m a r y _ s i z e " : " 4 9 5 b " , " p r i m a r y _ s i z e _ i n _ b y t e s " : 4 9 5 , " s i z e " : " 4 9 5 b " , " s i z e _ i n _ b y t e s " : 4 9 5 } , . . .
  16. Delete Index $ c u r l - X D

    E L E T E ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s ' { " o k " : t r u e , " a c k n o w l e d g e d " : t r u e }
  17. Create custom Index { " s e t t i

    n g s " : { " i n d e x " : { " n u m b e r _ o f _ s h a r d s " : 6 , " n u m b e r _ o f _ r e p l i c a s " : 0 } } } $ c u r l - X P U T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s ' - d @ b o d y . j s o n { " o k " : t r u e , " a c k n o w l e d g e d " : t r u e }
  18. Index a document { " n a m e "

    : " p r y " , " p l a t f o r m " : " r u b y " , " r u b y g e m s _ v e r s i o n " : " 1 . 5 . 2 " , " d e s c r i p t i o n " : " a t t a c h a n i r b - l i k e s e s s i o n t o a n y o b j e c t a t r u n t i m e " , " e m a i l " : " a n u r a g @ e x a m p l e . c o m " , " h a s _ r d o c " : t r u e , " h o m e p a g e " : " h t t p : / / b a n i s t e r f i e n d . w o r d p r e s s . c o m " } $ c u r l - X P O S T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / ' - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " l s J g x i w E T 6 e g " , " _ v e r s i o n " : 1 }
  19. Get document $ c u r l - X G

    E T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / l s J g x i w E T 6 e g ' | p y t h o n - m j s o n . t o o l { " _ i d " : " l s J g x i w E T 6 e g " , " _ i n d e x " : " g e m s " , " _ s o u r c e " : { " d e s c r i p t i o n " : " a t t a c h a n i r b - l i k e s e s s i o n t o a n y o b j e c t a t r u n t i m e " , " e m a i l " : " a n u r a g @ e x a m p l e . c o m " , " h a s _ r d o c " : t r u e , " h o m e p a g e " : " h t t p : / / b a n i s t e r f i e n d . w o r d p r e s s . c o m " , " n a m e " : " p r y " , " p l a t f o r m " : " r u b y " , " r u b y g e m s _ v e r s i o n " : " 1 . 5 . 2 " } , " _ t y p e " : " t e s t " , " _ v e r s i o n " : 1 , " e x i s t s " : t r u e }
  20. Index another document { " n a m e "

    : " g r i t " , " p l a t f o r m " : " j r u b y " , " r u b y g e m s _ v e r s i o n " : " 2 . 5 . 0 " , " d e s c r i p t i o n " : " R u b y l i b r a r y f o r e x t r a c t i n g i n f o r m a t i o n f r o m a g i t r e p o s i t o r y . " , " e m a i l " : " m o j o m b o @ g i t h u b . c o m " , " h a s _ r d o c " : f a l s e , " h o m e p a g e " : " h t t p : / / g i t h u b . c o m / m o j o m b o / g r i t " } $ c u r l - X P O S T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / ' - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " i j U O H i 2 c Q c 2 " , " _ v e r s i o n " : 1 }
  21. Custom Document IDs IDs are unique across Index. Composed of

    DocumentType and ID. { " n a m e " : " g r i t " , " p l a t f o r m " : " j r u b y " , " r u b y g e m s _ v e r s i o n " : " 2 . 5 . 1 " , " d e s c r i p t i o n " : " R u b y l i b r a r y f o r e x t r a c t i n g i n f o r m a t i o n f r o m a g i t r e p o s i t o r y . " , " e m a i l " : " m o j o m b o @ g i t h u b . c o m " , " h a s _ r d o c " : f a l s e , " h o m e p a g e " : " h t t p : / / g i t h u b . c o m / m o j o m b o / g r i t " } $ c u r l - X P U T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / g r i t - 2 . 5 . 1 ' - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " g r i t - 2 . 5 . 1 " , " _ v e r s i o n " : 1 }
  22. Document Versions $ c u r l - X P

    U T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / g r i t - 2 . 5 . 1 ' - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " g r i t - 2 . 5 . 1 " , " _ v e r s i o n " : 2 }
  23. Searching Documents { " q u e r y "

    : { " t e r m " : { " n a m e " : " p r y " } } } $ c u r l - X P O S T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / _ s e a r c h - d @ b o d y . j s o n | p y t h o n - m j s o n . t o o l { " _ s h a r d s " : { " f a i l e d " : 0 , " s u c c e s s f u l " : 6 , " t o t a l " : 6 } , " h i t s " : { " h i t s " : [ { " _ i d " : " M W k K g z s M R g K " , " _ i n d e x " : " g e m s " , " _ s c o r e " : 1 . 4 0 5 4 6 5 1 , " _ s o u r c e " : { " d e s c r i p t i o n " : " a t t a c h a n i r b - l i k e s e s s i o n t o a n y o b j e c t a t r u n t i m e " , " e m a i l " : " a n u r a g @ e x a m p l e . c o m " , " h a s _ r d o c " : t r u e , " h o m e p a g e " : " h t t p : / / b a n i s t e r f i e n d . w o r d p r e s s . c o m " , " n a m e " : " p r y " , " p l a t f o r m " : " r u b y " , " r u b y g e m s _ v e r s i o n " : " 1 . 5 . 2 " } , " _ t y p e " : " t e s t " } ] , " m a x _ s c o r e " : 1 . 4 0 5 4 6 5 1 , " t o t a l " : 1
  24. Counting Documents { " t e r m " :

    { " n a m e " : " p r y " } } $ c u r l - X G E T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / _ c o u n t - d @ b o d y . j s o n { " _ s h a r d s " : { " f a i l e d " : 0 , " s u c c e s s f u l " : 6 , " t o t a l " : 6 } , " c o u n t " : 1 }
  25. Update a Document The partial document is merged using simple

    recursive merge. { " d o c " : { " p l a t f o r m " : " m a c r u b y " } } $ c u r l - X P O S T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / g r i t - 2 . 5 . 1 / _ u p d a t e - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " g r i t - 2 . 5 . 1 " , " _ v e r s i o n " : 4 }
  26. Update via Script { " s c r i p

    t " : " c t x . _ s o u r c e . p l a t f o r m = v m _ n a m e " , " p a r a m s " : { " v m _ n a m e " : " r u b i n i u s " } } $ c u r l - X P O S T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / g r i t - 2 . 5 . 1 / _ u p d a t e - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " g r i t - 2 . 5 . 1 " , " _ v e r s i o n " : 5 }
  27. Delete Document $ c u r l - X D

    E L E T E ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / t e s t / g r i t - 2 . 5 . 1 ' { " o k " : t r u e , " f o u n d " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " t e s t " , " _ i d " : " g r i t - 2 . 5 . 1 " , " _ v e r s i o n " : 6 }
  28. Put Mapping { " g e m " : {

    " p r o p e r t i e s " : { " n a m e " : { " t y p e " : " s t r i n g " , " i n d e x " : " n o t _ a n a l y z e d " } , " p l a t f o r m " : { " t y p e " : " s t r i n g " , " i n d e x " : " n o t _ a n a l y z e d " } , " r u b y g e m s _ v e r s i o n " : { " t y p e " : " s t r i n g " , " i n d e x " : " n o t _ a n a l y z e d " } , " d e s c r i p t i o n " : { " t y p e " : " s t r i n g " , " s t o r e " : " y e s " } , " h a s _ r d o c " : { " t y p e " : " b o o l e a n " } } } } $ c u r l - X P U T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / g e m / _ m a p p i n g ' - d @ b o d y . j s o n $ c u r l - X G E T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / _ m a p p i n g ' | p y t h o n - m j s o n . t o o l
  29. Index Document with Mapping { " n a m e

    " : " g r i t " , " p l a t f o r m " : " r u b y " , " r u b y g e m s _ v e r s i o n " : " 2 . 5 . 1 " , " d e s c r i p t i o n " : " R u b y l i b r a r y f o r e x t r a c t i n g i n f o r m a t i o n f r o m a g i t r e p o s i t o r y . " , " e m a i l " : " m o j o m b o @ g i t h u b . c o m " , " h a s _ r d o c " : f a l s e , " h o m e p a g e " : " h t t p : / / g i t h u b . c o m / m o j o m b o / g r i t " } $ c u r l - X P U T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / g e m / g r i t - 2 . 5 . 1 ' - d @ b o d y . j s o n { " o k " : t r u e , " _ i n d e x " : " g e m s " , " _ t y p e " : " g e m " , " _ i d " : " g r i t - 2 . 5 . 1 " , " _ v e r s i o n " : 1 }
  30. Matching documents { " q u e r y "

    : { " m a t c h " : { " d e s c r i p t i o n " : " g i t r e p o s i t o r y " } } } $ c u r l - X P O S T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / g e m / _ s e a r c h - d @ b o d y . j s o n
  31. Highlighting { " q u e r y " :

    { " m a t c h " : { " d e s c r i p t i o n " : " g i t r e p o s i t o r y " } } , " h i g h l i g h t " : { " f i e l d s " : { " d e s c r i p t i o n " : { } } } } $ c u r l - X P O S T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / g e m / _ s e a r c h - d @ b o d y . j s o n " h i g h l i g h t " : { " d e s c r i p t i o n " : [ " R u b y l i b r a r y f o r e x t r a c t i n g i n f o r m a t i o n f r o m a < e m > g i t < / e m > < e m > r e p o s i t o r y < / e m > . " ] }
  32. Search Facets { " q u e r y "

    : { " m a t c h _ a l l " : { } } , " f a c e t s " : { " g e m _ n a m e s " : { " t e r m s " : { " f i e l d " : " n a m e " } } } } $ c u r l - X P O S T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / _ s e a r c h - d @ b o d y . j s o n . . . " f a c e t s " : { " g e m _ n a m e s " : { " _ t y p e " : " t e r m s " , " m i s s i n g " : 0 , " o t h e r " : 0 , " t e r m s " : [ { " c o u n t " : 2 , " t e r m " : " p r y " } , { " c o u n t " : 2 , " t e r m " : " g r i t " } , { " c o u n t " : 1 , " t e r m " : " a b c " } ] , " t o t a l " : 5 } } ,
  33. Prepare Data & Configure # g e m i n

    s t a l l y a j l - r u b y t i r e a c t i v e s u p p o r t $ g i t c l o n e h t t p s : / / g i t h u b . c o m / g n u r a g / a a d h a a r $ c d a a d h a a r / d a t a $ u n z i p U I D A I - E N R - D E T A I L - 2 0 1 2 1 0 0 1 . z i p $ c d . . / b i n $ v i a a d h a a r . r b
  34. Configuration A A D H A A R _ D

    A T A _ D I R = " / p a t h / t o / a a d h a a r / d a t a " E S _ U R L = " h t t p : / / l o c a l h o s t : 9 2 0 0 " E S _ I N D E X = ' a a d h a a r ' E S _ T Y P E = " U I D " B A T C H _ S I Z E = 1 0 0 0
  35. Index Aliases Group multiple Indexes, and query them together. c

    u r l - X P O S T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / _ a l i a s e s ' - d ' { " a c t i o n s " : [ { " a d d " : { " i n d e x " : " i n d e x 1 " , " a l i a s " : " m a s t e r - a l i a s " } } { " a d d " : { " i n d e x " : " i n d e x 2 " , " a l i a s " : " m a s t e r - a l i a s " } } ] } ' c u r l - X P O S T ' h t t p : / / l o c a l h o s t : 9 2 0 0 / _ a l i a s e s ' - d ' { " a c t i o n s " : [ { " r e m o v e " : { " i n d e x " : " i n d e x 2 " , " a l i a s " : " m a s t e r - a l i a s " } } ] } '
  36. Parents & Children $ c u r l - X

    P U T h t t p : / / l o c a l h o s t : 9 2 0 0 / g e m s / g e m / r o x m l ? p a r e n t = r e x m l - d ' { " t a g " : " s o m e t h i n g " } '