Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taming Your Data with Elasticsearch - PHP Benelux

derek-b
January 25, 2019

Taming Your Data with Elasticsearch - PHP Benelux

Are you searching unstructured data or text fields? Do you need to aggregate and summarize your geo, financial, or other numeric data? Do you want to query your structured data in new and exciting ways? If so, Elasticsearch may be right for you. Let’s explore the many ways you can ask questions about your data and have it make sense to you and your users. We’ll sort through millions of rows in milliseconds and give you tools to take your data analysis to the next level. You will learn how to use PHP libraries and basic RESTful API calls to store, search, and aggregate your data.

derek-b

January 25, 2019
Tweet

More Decks by derek-b

Other Decks in Technology

Transcript

  1. @DerekB_WI [email protected]
    Taming Your Data
    with Elasticsearch

    View full-size slide

  2. Hello! I Am Derek Binkley
    Senior Engineer with TurnTo Networks
    Volunteer with Community Justice
    @DerekB_WI [email protected]

    View full-size slide

  3. Customer Generated Content

    View full-size slide

  4. @DerekB_WI [email protected]
    Fast Searching
    Scalability
    Finding Value within
    a Sea of Data

    View full-size slide

  5. @DerekB_WI [email protected]
    What is it?
    open-source, RESTful, distributed
    search and analytics engine built
    on Apache Lucene
    Elasticsearch
    Tool for querying and exploring
    data
    Kibana
    Beats and Logstash
    Tool for ingesting data from
    specific sources

    View full-size slide

  6. @DerekB_WI [email protected]
    How is it stored?
    A grouping of JSON documents
    with similar structure.
    Index
    Defines what is contained in a
    document
    Mapping
    A JSON document stores each
    data element.
    Document

    View full-size slide

  7. @DerekB_WI [email protected]
    Store new document
    POST

    View full-size slide

  8. @DerekB_WI [email protected]
    Specify ID to update or insert
    PUT

    View full-size slide

  9. @DerekB_WI [email protected]
    Created automatically or manually
    Updated automatically
    Mapping

    View full-size slide

  10. @DerekB_WI [email protected]
    Define empty index
    Setup document structure
    https:/
    /www.elastic.co/guide/en/
    elasticsearch/reference/current/indices-
    put-mapping.html
    Put Mapping

    View full-size slide

  11. @DerekB_WI [email protected]
    Storing Data with PHP

    View full-size slide

  12. @DerekB_WI [email protected]
    Guzzle converts array to
    JSON body
    Put Mapping

    View full-size slide

  13. @DerekB_WI [email protected]
    Guzzle converts array to
    JSON body
    Put Mapping

    View full-size slide

  14. @DerekB_WI [email protected]
    Guzzle converts array to
    JSON body
    Post

    View full-size slide

  15. @DerekB_WI [email protected]
    Automatically assigned - POST
    Manually assigned - PUT
    ID

    View full-size slide

  16. @DerekB_WI [email protected]
    Replaces entire document if exists
    Adds new if not exists
    PUT DOC

    View full-size slide

  17. @DerekB_WI [email protected]
    Only updates named fields
    Update Fields

    View full-size slide

  18. @DerekB_WI [email protected]
    Painless scripting language
    Script Update

    View full-size slide

  19. @DerekB_WI [email protected]
    Searching Data

    View full-size slide

  20. @DerekB_WI [email protected]
    Define query in JSON body
    match_all finds everything
    Query Keyword

    View full-size slide

  21. @DerekB_WI [email protected]
    Looking for best results
    Find a Match

    View full-size slide

  22. @DerekB_WI [email protected]
    Results are scored
    Find a Match

    View full-size slide

  23. @DerekB_WI [email protected]
    Results are scored
    Search Within Text

    View full-size slide

  24. @DerekB_WI [email protected]
    Results are scored
    Search Within Text

    View full-size slide

  25. @DerekB_WI [email protected]
    Damerau-Levenshtein
    Distance
    Fuzziness

    View full-size slide

  26. @DerekB_WI [email protected]
    more_like_this query
    Similar Documents

    View full-size slide

  27. @DerekB_WI [email protected]
    Suggest
    Word Suggestions

    View full-size slide

  28. @DerekB_WI [email protected]
    Suggest
    Word Suggestions

    View full-size slide

  29. @DerekB_WI [email protected]
    Paginating Data

    View full-size slide

  30. @DerekB_WI [email protected]
    Skip 100 and limit results to
    100.
    Skip Results

    View full-size slide

  31. @DerekB_WI [email protected]
    Only for first 10,000 hits
    Skip Results Organized into shards
    Each shard is a Lucene index
    Move data around clusters

    View full-size slide

  32. @DerekB_WI [email protected]
    Only stays open for specified time
    Scroll Through Results

    View full-size slide

  33. @DerekB_WI [email protected]
    Keep track with _scroll_id
    Scroll Through Results

    View full-size slide

  34. @DerekB_WI [email protected]
    POST to scroll endpoint for next
    results.
    Scroll Through Results

    View full-size slide

  35. @DerekB_WI [email protected]
    Query unique results or keywords
    What’s In a Field

    View full-size slide

  36. @DerekB_WI [email protected]
    Query unique results or keywords that
    get sorted into “buckets”
    What’s In a Field

    View full-size slide

  37. @DerekB_WI [email protected]
    Calculate summary
    values such as max, min,
    average
    Metrics

    View full-size slide

  38. @DerekB_WI [email protected]
    Calculate summary
    values such as max, min,
    average
    Metrics

    View full-size slide

  39. @DerekB_WI [email protected]
    Group documents into buckets
    Buckets with Metrics

    View full-size slide

  40. @DerekB_WI [email protected]
    Group documents into buckets
    Buckets with Metrics

    View full-size slide

  41. @DerekB_WI [email protected]
    Complex mapping
    applications can be created
    by using four types of
    queries
    Uses GeoJSON to define shape
    GeoShape
    Define top_left and bottom_right
    Geo Bounding Box
    Geo searches
    Previous example
    Geo Distance
    Define points to create a polygon
    Geo Polygon

    View full-size slide

  42. @DerekB_WI [email protected]
    Find results with a distance of a point
    Distance Search

    View full-size slide

  43. @DerekB_WI [email protected]
    Filter by geo, aggregate by term
    Distance Aggregation

    View full-size slide

  44. @DerekB_WI [email protected]
    Filter by geo, aggregate by term
    Distance Aggregation

    View full-size slide

  45. @DerekB_WI [email protected]
    Sort by distance
    Distance Sort

    View full-size slide

  46. @DerekB_WI [email protected]
    Sort by distance
    Distance Sort

    View full-size slide

  47. @DerekB_WI [email protected]
    Keeping in Sync

    View full-size slide

  48. @DerekB_WI [email protected]
    Elasticsearch is read and
    search optimized at the
    expense of expensive writes
    Use batch API to insert many records
    Batches
    Strategy for queuing up data for batching
    Message Queues
    Sync with database
    Batch by range
    Ranges of data

    View full-size slide

  49. @DerekB_WI [email protected]
    Cannot update mapping manually
    Must setup destination index
    Reindex mapping

    View full-size slide

  50. @DerekB_WI [email protected]
    Can use alias to help with cutover
    Reindex mapping

    View full-size slide

  51. @DerekB_WI [email protected]
    ANY QUESTIONS?
    You can find me at
    @DerekB_WI
    [email protected]
    derekb-wi.com
    Thanks!

    View full-size slide

  52. @DerekB_WI [email protected]
    https:/
    /joind.in/talk/5cced
    THANKS!

    View full-size slide

  53. @DerekB_WI [email protected]
    https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
    https://en.wikipedia.org/wiki/Damerau-Levenshtein_distance
    https://lucene.apache.org/
    https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-getting-started.html
    http://geojson.org/
    Resources
    https://www.elastic.co/blog/found-keeping-elasticsearch-in-sync
    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

    View full-size slide