Slide 1

Slide 1 text

Building a Search Engine with Python + Elasticsearch Julie Qiu @jqiu25 Jim Grandpre @jimtla #PyCon2018

Slide 2

Slide 2 text

Julie Qiu Engineering Lead, Spring @jqiu25 Jim Grandpre Director of Engineering, Spring @jimtla

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Tutorial Goals and Structure

Slide 9

Slide 9 text

Develop an intuition for elasticsearch. Understand the most important pieces. Presentations

Slide 10

Slide 10 text

Build comfort with the ES docs, and pyES Have a starting point for PY + ES projects. Use ES & Python to make search work. Problem Sets

Slide 11

Slide 11 text

15m Introduction to Elasticsearch & Indexing 45m Problem Set: Indexing 10m Break 5m Introduction to Searching 45m Problem Set: Searching 5m Break 5m Introduction to Analysis 45m Problem Set: Analysis Agenda

Slide 12

Slide 12 text

Work in Groups, help each other learn. Take your time. Read the docs. Problem Set Advice

Slide 13

Slide 13 text

Many questions on the problem set include “spoilers.” Try to figure things out on your own, but if you feel stuck don’t hesitate to read them. Spoilers

Slide 14

Slide 14 text

Intro to Elasticsearch

Slide 15

Slide 15 text

What is Elasticsearch? A search optimized distributed database, built on top of Lucene.

Slide 16

Slide 16 text

Database Stores structured documents. Insert, Update, Delete, and Query. Provides a friendly interface (JSON & HTTP).

Slide 17

Slide 17 text

Lucene A Java library used by Elasticsearch. Core data storage. Core query execution.

Slide 18

Slide 18 text

Distributed Elasticsearch is typically run in a cluster. Can add and remove instances any time. Data is split across instances, and queries are executed across the cluster.

Slide 19

Slide 19 text

Search Optimized Data is indexed to allow for fast searching. Query language for complex searches. Built in support for text analysis.

Slide 20

Slide 20 text

Database Structure

Slide 21

Slide 21 text

Index (~ Table) Database Structure

Slide 22

Slide 22 text

Document (~ Row) Document (~ Row) Index (~ Table) Database Structure Document (~ Row) Document (~ Row)

Slide 23

Slide 23 text

Document (~ Row) Document (~ Row) Index (~ Table) Database Structure Document (~ Row) Document (~ Row) Document Type (Schema)

Slide 24

Slide 24 text

“A Great Product” “This Cool Shirt” products_index Database Structure “Neat Pants” “Kittens” product

Slide 25

Slide 25 text

Getting data into Elasticsearch

Slide 26

Slide 26 text

??? Getting data into Elasticsearch

Slide 27

Slide 27 text

??? Data Getting data into Elasticsearch

Slide 28

Slide 28 text

??? JSON { “name”: ‘shirt’, “image”:’www.shirtphoto.com’ } Getting data into Elasticsearch

Slide 29

Slide 29 text

Indexing API Getting data into Elasticsearch JSON { “name”: ‘shirt’, “image”:’www.shirtphoto.com’ }

Slide 30

Slide 30 text

{ “name”: ‘shirt’, “image”:’www.shirtphoto.com’ } products_index product Database Structure

Slide 31

Slide 31 text

localhost:9200 Elasticsearch Index

Slide 32

Slide 32 text

Kibana Dev Tools Console (localhost:5601) bit.ly/pycon-es-devtools1

Slide 33

Slide 33 text

App Walkthrough

Slide 34

Slide 34 text

Indexing with ES and Python

Slide 35

Slide 35 text

.git/ .gitignore README.md lessons/ requirements.txt searchapp/ searchapp.egg-info/ setup.py venv/ Our Repository

Slide 36

Slide 36 text

searchapp/ __init__.py app/ constants.py data.py index_products.py products.json run.py Search App: Indexing

Slide 37

Slide 37 text

Elasticsearch Py Official Python library for Elasticsearch elasticsearch-py.readthedocs.io/en/master/

Slide 38

Slide 38 text

Work in Groups, help each other learn. Take your time. Read the docs. Problem Sets

Slide 39

Slide 39 text

bit.ly/pycon-es-lesson1 Problem Set: Indexing

Slide 40

Slide 40 text

Searching data in Elasticsearch

Slide 41

Slide 41 text

??? Searching data

Slide 42

Slide 42 text

necklace Searching data ???

Slide 43

Slide 43 text

necklace Application (Flask App) Searching data ???

Slide 44

Slide 44 text

necklace Application (Flask App) Search API Searching data

Slide 45

Slide 45 text

Kibana Dev Tools Console (localhost:5601) bit.ly/pycon-es-devtools2

Slide 46

Slide 46 text

Searching with Python and ES

Slide 47

Slide 47 text

Elasticsearch DSL Python library – helps with writing and running Elasticsearch queries elasticsearch-dsl.readthedocs.io/en/latest/

Slide 48

Slide 48 text

Example Search: Raw query GET request to localhost:9200/products_index/product { "query": { "match": { "name": "necklace" } } }

Slide 49

Slide 49 text

Example Search: Comparison GET request to localhost:9200/products_index/product { "query": { "match": { "name": "necklace" } } } from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search( using=client, index=”products_index”, doc_type=”products” ) s.query( “match”, name=”necklace” ).execute()

Slide 50

Slide 50 text

Example Search: Comparison GET request to localhost:9200/products_index/products { "query": { "match": { "name": "necklace" } } } from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search( using=client, index=”products_index”, doc_type=”products” ) s.query( “match”, name=”necklace” ).execute()

Slide 51

Slide 51 text

Example Search: Comparison GET request to localhost:9200/products_index/products { "query": { "match": { "name": "necklace" } } } from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search( using=client, index=”products_index”, doc_type=”products” ) s.query( “match”, name=”necklace” ).execute()

Slide 52

Slide 52 text

Problem Set: Search bit.ly/pycon-es-lesson2 Continuing from Part 1 (also on GitHub repo) git commit -am “session1 work” git fetch git checkout session2 source venv/bin/activate python searchapp/index_products.py

Slide 53

Slide 53 text

Analysis

Slide 54

Slide 54 text

Analyzers improve search accuracy by adding additional structure to text. Analyzers

Slide 55

Slide 55 text

Analyzers Raw Walking The: Dog [‘Walking’, ‘The’, ‘Dog’] [‘walk’, ‘dog’] Tokenizer Token Filters

Slide 56

Slide 56 text

Break a string into components. Input: “Walking The: Dog” Standard: [“Walking”,“The”,“Dog”] Whitespace: [“Walking”,“The:”,“Dog”] Edge N-grams: [“W”,“Wa”,“Wal”,“Walk”, …] Tokenizers

Slide 57

Slide 57 text

Input: [“Walking”,“The”,“Dog”] Lowercase: [“walking”,“the”,“dog”] Stop Words:[“walking”,“dog”] Stemming: [“walk”,“dog”] Token Filters

Slide 58

Slide 58 text

Analyzers are configured in the mapping of your index. Custom analyzers are created in the settings of your index. Analyzing your Index

Slide 59

Slide 59 text

Analyzers are applied to both the fields of your document, and the queries against those fields. Analyzing your Index

Slide 60

Slide 60 text

Work in Groups, help each other learn. Take your time. Read the docs. Problem Set

Slide 61

Slide 61 text

bit.ly/pycon-es-lesson3 Survey: bit.ly/pycon-es-survey Problem Set and Survey

Slide 62

Slide 62 text

Thanks! Survey: bit.ly/pycon-es-survey Julie Qiu @jqiu25 Jim Grandpre @jimtla