Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch Always Pays Its Debts

Elasticsearch Always Pays Its Debts

These are the slides used as support by Quentin in our last technical workshop at Reputation VIP.

If you want to go futher, you can find the giant and awesome associated article here: http://reputationvip.io/elasticsearch-always-pays-its-debts, on our technical blog.

Reputation VIP

October 05, 2015
Tweet

More Decks by Reputation VIP

Other Decks in Programming

Transcript

  1. 3 The original article can be found here: http://reputationvip.io/elasticsearch- always-pays-its-debts/

    The GitHub repository with the corresponding examples can be found here: https://github.com/quentinfayet/elasticsearch/tree/v2.0 Pre-requisites
  2. 4 Mapping is the way data structures are described Mapping

    describes the way your data are stored Mapping Reminder
  3. 5 Indices can be created on the flight That’s not

    necessarily a good thing What if a typo occurred in your code, that concerns the Index’s name ? Can be disabled in the configuration file (action.auto_create_index: false) Mapping Automated Index
  4. 6 Data types can be guessed on the flight For

    example, Numeric or Date detection No Boolean detection yet Automated type detection can be turned off when configure the index, independently for each field. Mapping Type Detection
  5. 7 Mapping Type Detection – Numeric & Date { “character”:

    { “numeric_detection”: true, “date_detection”: [ “YYYY-MM-DD”, “YYYY-MM-DD at hh:mm:ss” ] } } Numeric detection: boolean Date detection: Array of ISO 8601 formatted dates
  6. 8 Mapping Type Detection – Turning Off { “character”: {

    “dynamic”: false, “properties”: { ... } } }
  7. 9 Analyzers is composed of Tokenizer and Token Filters Tokenizers

    split string into several tokens Token Filters are processing the tokens (uppercasing, removing some tokens, …) Mapping Analyzers - Reminder
  8. 10 Standard Analyzer Standard Tokenizer + standard token filter +

    lower case filter + stop token filter Stop token filter removes “stop words" Mapping Ready-to-use Analyzers: Standard Analyzer
  9. 12 Simple Analyzer Removes all non-letter characters. Example: “Jon1111Snow” ➔

    “Jon” + “Snow” Token type is “Word” Mapping Ready-to-use Analyzers: Simple Analyzer
  10. 13 Snowball Analyzer Stemming algorithm Token stream made with root

    words. Example: “King’s Landing” ➔ “King” + “Land” Mapping Ready-to-use Analyzers: Snowball Analyzer
  11. 14 Mapping Ready-to-use Analyzers: Mapping it to a field {

    “character”: { ... “properties”: { “biography”: “standard” } } } Example: Mapping “biography” character’s field to “standard” analyzer
  12. 16 Mapping Making your own analyzer { “settings”: { “index”:

    { “analysis”: { “analyzer”: { “tokenizer”: “tokenizerName”, “filter”: [ “firstFilterName”, ... ] } } } } }
  13. 17 Allows batch operations on index Index, Delete, Create 100

    Mb maximum by default. Set http.max_content_length to change it. Batch Indexing
  14. 18 Either using HTTP or UDP protocol UDP has been

    deprecated in Elasticsearch 2.0 UDP requires more configuration Batch Indexing
  15. 20 Inline Query Using API’s _search action Searching A first

    dive into full-text search: Inline Query $> curl –XGET http://localhost:9200/game_of_thrones/character/_search?q=house:Stark Using q parameter to describe the query, format: field:value JSON Response
  16. 21 DSL (Domain Specific Language) query Queries described with JSON

    Searching A first dive into full-text search: DSL Query Response is JSON Customizable and scriptable
  17. 22 Search for a specific term Searching A first dive

    into full-text search: The Term Query Not analyzed
  18. 23 Search for several terms, specifying the minimum matching acceptance

    Searching A first dive into full-text search: The Terms Query Not analyzed
  19. 25 Data are aggregated by the cluster once every single

    term request is finished. Searching A first dive into full-text search: The Match Query - Operator Way to aggregate data can be changed with “operator”
  20. 26 Searching A first dive into full-text search: The Match

    Query - Fuzziness Fuzziness is like allowing typos On IP addresses / dates / numeric fields, it is acting like a range
  21. 27 Searching A first dive into full-text search: Scripting Virtual

    fields being calculated on the flight by the cluster Using Groovy