Slide 1

Slide 1 text

Honza Král @honzakral Elasticsearch DSL

Slide 2

Slide 2 text

Elasticsearch

Slide 3

Slide 3 text

Distributed Search Engine Open Source
 
 Distributed
 
 Document-based
 
 Based on Lucene 
 JSON over HTTP

Slide 4

Slide 4 text

Document based JSON
 Dynamic Schema
 Some Relationships Nested Parent/Child

Slide 5

Slide 5 text

{! "id": 7635,! "accepted_answer_id": 7641,! "answer_count": 9,! "title": "Are you able to close your eyes and focus/think just on your code?",! "body": "How do I ......?",! "creation_date": "2010-09-27T19:16:57.757",! "closed_date": "2011-11-13T12:12:05.937",! "comment_count": 2,! "comments": [{! "creation_date": "2010-09-27T19:31:27.200",! "id": 9372,! "owner": { "display_name": "sange", "id": 3092 },! "post_id": 7635,! "text": "I sometimes close my eyes or stare at something ....."! }, {......}],! "favorite_count": 2,! "last_activity_date": "2010-09-28T00:28:08.393",! "owner": { "display_name": "flow", "id": 3761 },! "rating": 6,! "tags": [ "focus", "concentration" ],! "view_count": 368! } StackOverflow Question

Slide 6

Slide 6 text

Query DSL

Slide 7

Slide 7 text

queries & filters Abstract Syntax Tree

Slide 8

Slide 8 text

Queries (unstructured) Core Queries match, multi_match, phrase fuzzy, regexp, wildcard Compound Queries filtered bool function score ! Relies on analysis, produce score (relevancy)

Slide 9

Slide 9 text

Filters (structured) Core filters term, range, exists geo_distance, geo_bbox script Compound filters bool and/or/not ! Fast, cacheable

Slide 10

Slide 10 text

! {! "query": {! "filtered": {! "query": {! "bool": {! "must": [! {"multi_match": {"fields": ["title^10", "body"], "query": "php"}},! {"has_child": { "child_type": "answer", "query": {"match": {"body": "python"}}}}! ],! "must_not": {"multi_match": {"fields": ["title", "body"], "query": "python"}}! }! },! "filter": {"range": {"creation_date": {"from": "2012-01-01"}}}! }! },! "aggs": {! "tags": {! "terms": {"field": "tags"},! "aggs": {! "comment_avg": {"avg": {"field": "comment_count"}}! }! },! "frequency": {"date_histogram": {"field": "creation_date", "interval": "month"}}! }! } Example

Slide 11

Slide 11 text

Python

Slide 12

Slide 12 text

how hard can it be? HTTP

Slide 13

Slide 13 text

Elasticsearch Distributed load balancing, node failure, node discovery ! Different deployment environments nginx, thrift, PaaS ! REST API 96 API endpoints, 672 parameters, escaping, encoding

Slide 14

Slide 14 text

elasticsearch-py Low Level 1-to-1 to REST No Excuses No opinions Extensible/Modular All APIs

Slide 15

Slide 15 text

dict -> JSON from elasticsearch import Elasticsearch! ! es = Elasticsearch()! result = es.search(body={! "query": {! "filtered": {! "query": {! "bool": {! "must": [{"match": {"title": "python"}}],! "must_not": [{"match": {"title": "ruby"}}]! ! }! },! "filter": {! "range": {"creation_date": {"from": "2012-01-01"}}! }! }! }! })

Slide 16

Slide 16 text

elasticsearch-dsl from elasticsearch_dsl import Search, Q! ! # create Search, bind it to client! s = Search(using=es)! ! # querying twice will combine queries! # or combine manually: s.query(Q() & ~Q())! s = s.query('match', title='python').query(~Q('match', title='ruby'))! ! # filter will turn it to filtered query! s = s.filter('range', creation_date={"from": date(2012, 1, 1)})! ! # get a fancy result object!! result = s.execute()

Slide 17

Slide 17 text

Design

Slide 18

Slide 18 text

Say no to { [ ] }!

Slide 19

Slide 19 text

Automatic composition

Slide 20

Slide 20 text

We are not SQL Stay true to Query DSL

Slide 21

Slide 21 text

elasticsearch-dsl from elasticsearch_dsl import Search, Q! ! # create Search, bind it to client! s = Search(using=es)! ! # querying twice will combine queries! # or combine manually: s.query(Q() & ~Q())! s = s.query('match', title='python').query(~Q('match', title='ruby'))! ! # filter will turn it to filtered query! s = s.filter('range', creation_date={"from": date(2012, 1, 1)})! ! # get a fancy result object!! result = s.execute()

Slide 22

Slide 22 text

Q/F/A Shortcut for creating Queries/Filters/Aggregations ! Can use dicts or name + params ! Has elementary boolean logic Q('match', title='python') == Q({'match': {'title': 'python'}}) Q(1) & Q(2) == Q('bool', must=[Q(1), Q(2)])! Q(1) | (Q2) | Q(3) == Q('bool', should=[Q(1), Q(2), Q(3)])! ~F(1) == F('bool', must=[F(1)]) Q('match', title='python') == Match(title='python')

Slide 23

Slide 23 text

.query().filter() chaining Copy is made on each change ! Same for other methods ! Except Aggs! s[0:10], s.using(es), s.index('today', 'yesterday'), ... s.aggs.bucket('per_tag').metric('avg').metric('max')! s.aggs.bucket('per_country').bucket('per_tag').metric('avg')! s.aggs['per_country'].metric(...) s2 = s1.query(Q(1))! s1 != s2

Slide 24

Slide 24 text

Response Response object is returned:
 
 
 You can iterate over it and get hits:
 
 
 Aggregations can be accessed: for h in response:! print(h._meta.id, h.title) top_tag = response.aggregations.per_tag.buckets[0] response = s.execute()! if not response.success(): print("Partial results!")

Slide 25

Slide 25 text

Migration Path query = {! "query": {! "filtered": {! "query": {! "bool": {! "must": [{"match": {"title": "python"}}],! "must_not": [{"match": {"title": "ruby"}}]! ! }! },! "filter": {! "range": {"creation_date": {"from": date(2012, 1, 1)}}! }! }! }! }! ! s = Search.from_dict(query)! s = ...! query = s.to_dict()

Slide 26

Slide 26 text

Demo time! STOP

Slide 27

Slide 27 text

Django

Slide 28

Slide 28 text

Put your data in.... def sync_to_es(instance, **kwargs):! es.index(! index=settings.ES_INDEX,! doc_type=str(instance._meta),! id=instance.pk,! body=instance.to_json()) from elasticsearch.helpers import bulk! ! es.indices.put_mapping(index=settings.ES_INDEX, body={...})! ! bulk(es,! map(methodcaller('to_dict'), Model.objects.iterator()),! index=settings.ES_INDEX,! doc_type=str(Model._meta)) Bulk load - mgmt command Sync after change - signals

Slide 29

Slide 29 text

...query as usual

Slide 30

Slide 30 text

Future

Slide 31

Slide 31 text

Mapping DSL

Slide 32

Slide 32 text

Persistence layer

Slide 33

Slide 33 text

Django integration

Slide 34

Slide 34 text

@robhudson @willcage Thanks!

Slide 35

Slide 35 text

Thank You! 
 Honza Král twitter: @honzakral email: [email protected] ! Support: http://elasticsearch.com/support Training: http://training.elasticsearch.com/ We are hiring: http://elasticsearch.com/about/jobs/