Slide 1

Slide 1 text

@DerekB_WI [email protected] Taming Your Data with Elasticsearch

Slide 2

Slide 2 text

Hello! I Am Derek Binkley Senior Engineer with TurnTo Networks Volunteer with Community Justice @DerekB_WI [email protected]

Slide 3

Slide 3 text

Customer Generated Content

Slide 4

Slide 4 text

@DerekB_WI [email protected] Fast Searching Scalability Finding Value within a Sea of Data

Slide 5

Slide 5 text

@DerekB_WI [email protected] What is it? open-source, RESTful, distributed search and analytics engine built on Apache Lucene Elasticsearch Tool for querying and exploring data Kibana Beats and Logstash Tool for ingesting data from specific sources

Slide 6

Slide 6 text

@DerekB_WI [email protected] How is it stored? A grouping of JSON documents with similar structure. Index Defines what is contained in a document Mapping A JSON document stores each data element. Document

Slide 7

Slide 7 text

@DerekB_WI [email protected] Storing Data

Slide 8

Slide 8 text

@DerekB_WI [email protected] Store new document POST

Slide 9

Slide 9 text

@DerekB_WI [email protected] Specify ID to update or insert PUT

Slide 10

Slide 10 text

@DerekB_WI [email protected] Created automatically or manually Updated automatically Mapping

Slide 11

Slide 11 text

@DerekB_WI [email protected] Mapping

Slide 12

Slide 12 text

@DerekB_WI [email protected] Define empty index Setup document structure https:/ /www.elastic.co/guide/en/ elasticsearch/reference/current/indices- put-mapping.html Put Mapping

Slide 13

Slide 13 text

@DerekB_WI [email protected] Storing Data with PHP

Slide 14

Slide 14 text

@DerekB_WI [email protected] Guzzle converts array to JSON body Put Mapping

Slide 15

Slide 15 text

@DerekB_WI [email protected] Guzzle converts array to JSON body Put Mapping

Slide 16

Slide 16 text

@DerekB_WI [email protected] Guzzle converts array to JSON body Post

Slide 17

Slide 17 text

@DerekB_WI [email protected] Update Data

Slide 18

Slide 18 text

@DerekB_WI [email protected] Automatically assigned - POST Manually assigned - PUT ID

Slide 19

Slide 19 text

@DerekB_WI [email protected] Replaces entire document if exists Adds new if not exists PUT DOC

Slide 20

Slide 20 text

@DerekB_WI [email protected] Only updates named fields Update Fields

Slide 21

Slide 21 text

@DerekB_WI [email protected] Painless scripting language Script Update

Slide 22

Slide 22 text

@DerekB_WI [email protected] Searching Data

Slide 23

Slide 23 text

@DerekB_WI [email protected] Define query in JSON body match_all finds everything Query Keyword

Slide 24

Slide 24 text

@DerekB_WI [email protected] Looking for best results Find a Match

Slide 25

Slide 25 text

@DerekB_WI [email protected] Results are scored Find a Match

Slide 26

Slide 26 text

@DerekB_WI [email protected] Results are scored Search Within Text

Slide 27

Slide 27 text

@DerekB_WI [email protected] Results are scored Search Within Text

Slide 28

Slide 28 text

@DerekB_WI [email protected] Damerau-Levenshtein Distance Fuzziness

Slide 29

Slide 29 text

@DerekB_WI [email protected] more_like_this query Similar Documents

Slide 30

Slide 30 text

@DerekB_WI [email protected] Suggest Word Suggestions

Slide 31

Slide 31 text

@DerekB_WI [email protected] Suggest Word Suggestions

Slide 32

Slide 32 text

@DerekB_WI [email protected] Paginating Data

Slide 33

Slide 33 text

@DerekB_WI [email protected] Skip 100 and limit results to 100. Skip Results

Slide 34

Slide 34 text

@DerekB_WI [email protected] Only for first 10,000 hits Skip Results Organized into shards Each shard is a Lucene index Move data around clusters

Slide 35

Slide 35 text

@DerekB_WI [email protected] Only stays open for specified time Scroll Through Results

Slide 36

Slide 36 text

@DerekB_WI [email protected] Keep track with _scroll_id Scroll Through Results

Slide 37

Slide 37 text

@DerekB_WI [email protected] POST to scroll endpoint for next results. Scroll Through Results

Slide 38

Slide 38 text

@DerekB_WI [email protected] Aggregating

Slide 39

Slide 39 text

@DerekB_WI [email protected] Query unique results or keywords What’s In a Field

Slide 40

Slide 40 text

@DerekB_WI [email protected] Query unique results or keywords that get sorted into “buckets” What’s In a Field

Slide 41

Slide 41 text

@DerekB_WI [email protected] Calculate summary values such as max, min, average Metrics

Slide 42

Slide 42 text

@DerekB_WI [email protected] Calculate summary values such as max, min, average Metrics

Slide 43

Slide 43 text

@DerekB_WI [email protected] Group documents into buckets Buckets with Metrics

Slide 44

Slide 44 text

@DerekB_WI [email protected] Group documents into buckets Buckets with Metrics

Slide 45

Slide 45 text

@DerekB_WI [email protected] Geo Points

Slide 46

Slide 46 text

@DerekB_WI [email protected] Complex mapping applications can be created by using four types of queries Uses GeoJSON to define shape GeoShape Define top_left and bottom_right Geo Bounding Box Geo searches Previous example Geo Distance Define points to create a polygon Geo Polygon

Slide 47

Slide 47 text

@DerekB_WI [email protected] Find results with a distance of a point Distance Search

Slide 48

Slide 48 text

@DerekB_WI [email protected] Filter by geo, aggregate by term Distance Aggregation

Slide 49

Slide 49 text

@DerekB_WI [email protected] Filter by geo, aggregate by term Distance Aggregation

Slide 50

Slide 50 text

@DerekB_WI [email protected] Sort by distance Distance Sort

Slide 51

Slide 51 text

@DerekB_WI [email protected] Sort by distance Distance Sort

Slide 52

Slide 52 text

@DerekB_WI [email protected] Keeping in Sync

Slide 53

Slide 53 text

@DerekB_WI [email protected] Elasticsearch is read and search optimized at the expense of expensive writes Use batch API to insert many records Batches Strategy for queuing up data for batching Message Queues Sync with database Batch by range Ranges of data

Slide 54

Slide 54 text

@DerekB_WI [email protected] Cannot update mapping manually Must setup destination index Reindex mapping

Slide 55

Slide 55 text

@DerekB_WI [email protected] Can use alias to help with cutover Reindex mapping

Slide 56

Slide 56 text

@DerekB_WI [email protected] ANY QUESTIONS? You can find me at @DerekB_WI [email protected] derekb-wi.com Thanks!

Slide 57

Slide 57 text

@DerekB_WI [email protected] https:/ /joind.in/talk/5cced THANKS!

Slide 58

Slide 58 text

@DerekB_WI [email protected] https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up https://en.wikipedia.org/wiki/Damerau-Levenshtein_distance https://lucene.apache.org/ https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-getting-started.html http://geojson.org/ Resources https://www.elastic.co/blog/found-keeping-elasticsearch-in-sync https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html