Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Flask of full-text search with PostgreSQL [Dan Scott]

A Flask of full-text search with PostgreSQL [Dan Scott]

Building a web application does not have to be hard! This talk walks through a real web application, in under 200 lines of code, that uses Flask for the web and PostgreSQL for its fast, built in text search support.

PyCon Canada

August 10, 2013
Tweet

More Decks by PyCon Canada

Other Decks in Programming

Transcript

  1. A Flask of full-text search with PostgreSQL Dan Scott, Laurentian

    University http://coffeecode.net PyCon Canada 2013
  2. CREATE TABLE fulltext_search ( id SERIAL, doc TEXT, tsv TSVECTOR);

    CREATE TRIGGER tsvupdate BEFORE INSERT OR UPDATE ON fulltext_search FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger (tsv, 'pg_catalog.english', doc); CREATE INDEX fts_idx ON fulltext_search USING GIN (tsv); INSERT INTO fulltext_search (doc) VALUES ('Sketching the trees'), ('Found in schema.org'); PostgreSQL search schema
  3. • In PostgreSQL, each TEXT document ◦ is parsed into

    tokens (words, numbers, paths, email addresses, white space) ◦ each token is then normalized with dictionaries (case-folding, stemming, thesauri, stop words) ◦ to create a text-search vector (TSVECTOR) SELECT doc, tsv FROM items; doc | tsv ---------------------+--------------------- Sketching the trees | 'sketch':1 'tree':3 Full-text normalization
  4. WITH q AS ( SELECT to_tsquery('sketch & trees') AS query),

    ranked AS ( SELECT id, doc, ts_rank_cd(tsv, query) AS rank FROM fulltext_search, query WHERE q.query @@ tsv ORDER BY rank DESC LIMIT 10 ) SELECT id, ts_headline(doc, q.query) FROM ranked, q ORDER BY ranked DESC; id | ts_headline -----+------------------------------------ 5 | <b>Sketching</b> the <b>trees</b> 2 | <b>sketches</b>, water colours, and 19 | <b>tree</b>. What the man did was
  5. Why Flask? • Routes via decorators! @app.route("/search/<query>/") @app.route("/search/<query>/<int:page>") @app.route("/search/<query>/<int:page>/<int:limit>") def

    search(query, page=0, limit=10): "Return JSON formatted search results" • Great documentation • And solid Unicode support ◦ So requests are as simple as: curl http://localhost/search/pierre+trudeau curl http://localhost/search/pierre+trudeau/1/20 curl http://localhost/search/Montréal
  6. resj = json.dumps({ 'query': query, 'results': results, 'meta': { 'total':

    count, 'page': page, 'limit': limit, 'results': len(results) } }) return flask.Response( response=str(resj), mimetype='application/json' ) Down the rabbit hole: Returning JSON
  7. @app.route("/search") def search(): """Simple search for terms, with limit and

    paging""" query = flask.request.args.get('query', '') page = flask.request.args.get('page', '') jsonu = u"%s/search/%s/" % (JSON_HOST, urllib.parse.quote_plus(query.encode('utf-8')) ) if page: jsonu = u"%s%d" % (jsonu, int(page)) res = json.loads( urllib.request.urlopen(jsonu).read().decode('utf- 8') ) Through the looking glass: Reading JSON
  8. @app.route("/search") def search(): # ... # Read JSON … now

    render it template = env.get_template('results.html') return(template.render( terms=res['query'].replace('+', ' '), results=res, request=flask.request )) And back again: rendering
  9. Further information • Demonstration application: ◦ Gitorious repository / Github

    • Internet Archive harvester project: ◦ https://gitorious.org/ia-harvester • PostgreSQL full-text search: ◦ PostgreSQL official documentation ◦ Seek and ye shall find: PostgreSQL full-tex search (presentation at PostgresOpen 2012) • Flask official docs • Jinja2 official docs