Slide 1

Slide 1 text

A Flask of full-text search with PostgreSQL Dan Scott, Laurentian University http://coffeecode.net PyCon Canada 2013

Slide 2

Slide 2 text

Perl by day Python at night!

Slide 3

Slide 3 text

Demo application: http://gitorious.org/~dbs “PostgreSQL full text search engine”

Slide 4

Slide 4 text

Problem: Digitized student newspapers need to be found

Slide 5

Slide 5 text

Solution: Python! ... and PostgreSQL, a rock-solid relational database

Slide 6

Slide 6 text

CREATE TABLE fulltext_search ( id SERIAL, doc TEXT, tsv TSVECTOR); CREATE TRIGGER tsvupdate BEFORE INSERT OR UPDATE ON fulltext_search FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger (tsv, 'pg_catalog.english', doc); CREATE INDEX fts_idx ON fulltext_search USING GIN (tsv); INSERT INTO fulltext_search (doc) VALUES ('Sketching the trees'), ('Found in schema.org'); PostgreSQL search schema

Slide 7

Slide 7 text

● In PostgreSQL, each TEXT document ○ is parsed into tokens (words, numbers, paths, email addresses, white space) ○ each token is then normalized with dictionaries (case-folding, stemming, thesauri, stop words) ○ to create a text-search vector (TSVECTOR) SELECT doc, tsv FROM items; doc | tsv ---------------------+--------------------- Sketching the trees | 'sketch':1 'tree':3 Full-text normalization

Slide 8

Slide 8 text

Full-text search as “just another SQL query”

Slide 9

Slide 9 text

WITH q AS ( SELECT to_tsquery('sketch & trees') AS query), ranked AS ( SELECT id, doc, ts_rank_cd(tsv, query) AS rank FROM fulltext_search, query WHERE q.query @@ tsv ORDER BY rank DESC LIMIT 10 ) SELECT id, ts_headline(doc, q.query) FROM ranked, q ORDER BY ranked DESC; id | ts_headline -----+------------------------------------ 5 | Sketching the trees 2 | sketches, water colours, and 19 | tree. What the man did was

Slide 10

Slide 10 text

Search app architecture PostgreSQL REST - Flask Harvester Search requests Web - Flask JSON

Slide 11

Slide 11 text

Why Flask? ● Routes via decorators! @app.route("/search//") @app.route("/search//") @app.route("/search///") def search(query, page=0, limit=10): "Return JSON formatted search results" ● Great documentation ● And solid Unicode support ○ So requests are as simple as: curl http://localhost/search/pierre+trudeau curl http://localhost/search/pierre+trudeau/1/20 curl http://localhost/search/Montréal

Slide 12

Slide 12 text

resj = json.dumps({ 'query': query, 'results': results, 'meta': { 'total': count, 'page': page, 'limit': limit, 'results': len(results) } }) return flask.Response( response=str(resj), mimetype='application/json' ) Down the rabbit hole: Returning JSON

Slide 13

Slide 13 text

@app.route("/search") def search(): """Simple search for terms, with limit and paging""" query = flask.request.args.get('query', '') page = flask.request.args.get('page', '') jsonu = u"%s/search/%s/" % (JSON_HOST, urllib.parse.quote_plus(query.encode('utf-8')) ) if page: jsonu = u"%s%d" % (jsonu, int(page)) res = json.loads( urllib.request.urlopen(jsonu).read().decode('utf- 8') ) Through the looking glass: Reading JSON

Slide 14

Slide 14 text

@app.route("/search") def search(): # ... # Read JSON … now render it template = env.get_template('results.html') return(template.render( terms=res['query'].replace('+', ' '), results=res, request=flask.request )) And back again: rendering

Slide 15

Slide 15 text

And the payoff...

Slide 16

Slide 16 text

Further information ● Demonstration application: ○ Gitorious repository / Github ● Internet Archive harvester project: ○ https://gitorious.org/ia-harvester ● PostgreSQL full-text search: ○ PostgreSQL official documentation ○ Seek and ye shall find: PostgreSQL full-tex search (presentation at PostgresOpen 2012) ● Flask official docs ● Jinja2 official docs