Slide 1

Slide 1 text

What is this "search" that you speak of?? @honzakral

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

"unstructured"

Slide 4

Slide 4 text

Looking for content

Slide 5

Slide 5 text

grep -i -r 'web.*framework'

Slide 6

Slide 6 text

WHERE text ILIKE '%python%'

Slide 7

Slide 7 text

long, long time ago...

Slide 8

Slide 8 text

long, long time ago... Bible concordance, finished 1230

Slide 9

Slide 9 text

1230

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Demo Time!

Slide 12

Slide 12 text

{ 'description': { ... 'programming': {1}, 'python': {0, 1}, 'quick': {0, 1}, 'reinvent': {0}, ... }, 'title': { ... } }

Slide 13

Slide 13 text

def index_docs(docs, *fields): index = defaultdict( lambda: defaultdict(set)) for id, doc in enumerate(docs): for field in fields: for token in analyze(doc[field]): index[field][token].add(id) return index

Slide 14

Slide 14 text

SPLIT_RE = re.compile(r'[^a-zA-Z0-9]') def tokenize(text): yield from SPLIT_RE.split(text) def lowercase(tokens): for t in tokens: yield t.lower() SYNONYMS = { 'rapid': 'quick', } def synonyms(tokens): for t in tokens: yield SYNONYMS.get(t, t) def analyze(text): tokens = tokenize(text) for token_filter in (lowercase, synonyms): tokens = token_filter(tokens) yield from tokens

Slide 15

Slide 15 text

COMBINE = { 'OR': set.union, 'AND': set.intersection, } def search_in_fields(index, query, fields): for t in analyze(query): yield COMBINE['OR'](*(index[f][t] for f in fields)) def search(index, query, operator='AND', fields=None): fields = fields or index.keys() combine = COMBINE[operator] return combine(*search_in_fields(index, query, fields))

Slide 16

Slide 16 text

Real world

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Dictionary dict -> list

Slide 19

Slide 19 text

Postings List set -> list

Slide 20

Slide 20 text

Combine set union/intersect -> merge lists

Slide 21

Slide 21 text

Complex Queries

Slide 22

Slide 22 text

Prefix py*

Slide 23

Slide 23 text

Phrase "monty python"

Slide 24

Slide 24 text

http://bit.ly/searchpy

Slide 25

Slide 25 text

Thank you! @honzakral http://bit.ly/searchpy