Building a Search Engine
with Python + Elasticsearch
Julie Qiu
@jqiu25
Jim Grandpre
@jimtla
#PyCon2018
Slide 2
Slide 2 text
Julie Qiu
Engineering Lead, Spring
@jqiu25
Jim Grandpre
Director of Engineering, Spring
@jimtla
Slide 3
Slide 3 text
3
Slide 4
Slide 4 text
4
Slide 5
Slide 5 text
5
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
Tutorial Goals and Structure
Slide 9
Slide 9 text
Develop an intuition for elasticsearch.
Understand the most important pieces.
Presentations
Slide 10
Slide 10 text
Build comfort with the ES docs, and pyES
Have a starting point for PY + ES projects.
Use ES & Python to make search work.
Problem Sets
Slide 11
Slide 11 text
15m Introduction to Elasticsearch & Indexing
45m Problem Set: Indexing
10m Break
5m Introduction to Searching
45m Problem Set: Searching
5m Break
5m Introduction to Analysis
45m Problem Set: Analysis
Agenda
Slide 12
Slide 12 text
Work in Groups, help each other learn.
Take your time.
Read the docs.
Problem Set Advice
Slide 13
Slide 13 text
Many questions on the problem set include
“spoilers.”
Try to figure things out on your own, but if
you feel stuck don’t hesitate to read them.
Spoilers
Slide 14
Slide 14 text
Intro to Elasticsearch
Slide 15
Slide 15 text
What is Elasticsearch?
A search optimized distributed database,
built on top of Lucene.
Slide 16
Slide 16 text
Database
Stores structured documents.
Insert, Update, Delete, and Query.
Provides a friendly interface (JSON & HTTP).
Slide 17
Slide 17 text
Lucene
A Java library used by Elasticsearch.
Core data storage.
Core query execution.
Slide 18
Slide 18 text
Distributed
Elasticsearch is typically run in a cluster.
Can add and remove instances any time.
Data is split across instances, and queries
are executed across the cluster.
Slide 19
Slide 19 text
Search Optimized
Data is indexed to allow for fast searching.
Query language for complex searches.
Built in support for text analysis.