Save 37% off PRO during our Black Friday Sale! »

A Flask of full-text search with PostgreSQL [Dan Scott]

A Flask of full-text search with PostgreSQL [Dan Scott]

Building a web application does not have to be hard! This talk walks through a real web application, in under 200 lines of code, that uses Flask for the web and PostgreSQL for its fast, built in text search support.


PyCon Canada

August 10, 2013


  1. A Flask of full-text search with PostgreSQL Dan Scott, Laurentian

    University PyCon Canada 2013
  2. Perl by day Python at night!

  3. Demo application: “PostgreSQL full text search engine”

  4. Problem: Digitized student newspapers need to be found

  5. Solution: Python! ... and PostgreSQL, a rock-solid relational database

  6. CREATE TABLE fulltext_search ( id SERIAL, doc TEXT, tsv TSVECTOR);

    CREATE TRIGGER tsvupdate BEFORE INSERT OR UPDATE ON fulltext_search FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger (tsv, 'pg_catalog.english', doc); CREATE INDEX fts_idx ON fulltext_search USING GIN (tsv); INSERT INTO fulltext_search (doc) VALUES ('Sketching the trees'), ('Found in'); PostgreSQL search schema
  7. • In PostgreSQL, each TEXT document ◦ is parsed into

    tokens (words, numbers, paths, email addresses, white space) ◦ each token is then normalized with dictionaries (case-folding, stemming, thesauri, stop words) ◦ to create a text-search vector (TSVECTOR) SELECT doc, tsv FROM items; doc | tsv ---------------------+--------------------- Sketching the trees | 'sketch':1 'tree':3 Full-text normalization
  8. Full-text search as “just another SQL query”

  9. WITH q AS ( SELECT to_tsquery('sketch & trees') AS query),

    ranked AS ( SELECT id, doc, ts_rank_cd(tsv, query) AS rank FROM fulltext_search, query WHERE q.query @@ tsv ORDER BY rank DESC LIMIT 10 ) SELECT id, ts_headline(doc, q.query) FROM ranked, q ORDER BY ranked DESC; id | ts_headline -----+------------------------------------ 5 | <b>Sketching</b> the <b>trees</b> 2 | <b>sketches</b>, water colours, and 19 | <b>tree</b>. What the man did was
  10. Search app architecture PostgreSQL REST - Flask Harvester Search requests

    Web - Flask JSON
  11. Why Flask? • Routes via decorators! @app.route("/search/<query>/") @app.route("/search/<query>/<int:page>") @app.route("/search/<query>/<int:page>/<int:limit>") def

    search(query, page=0, limit=10): "Return JSON formatted search results" • Great documentation • And solid Unicode support ◦ So requests are as simple as: curl http://localhost/search/pierre+trudeau curl http://localhost/search/pierre+trudeau/1/20 curl http://localhost/search/Montréal
  12. resj = json.dumps({ 'query': query, 'results': results, 'meta': { 'total':

    count, 'page': page, 'limit': limit, 'results': len(results) } }) return flask.Response( response=str(resj), mimetype='application/json' ) Down the rabbit hole: Returning JSON
  13. @app.route("/search") def search(): """Simple search for terms, with limit and

    paging""" query = flask.request.args.get('query', '') page = flask.request.args.get('page', '') jsonu = u"%s/search/%s/" % (JSON_HOST, urllib.parse.quote_plus(query.encode('utf-8')) ) if page: jsonu = u"%s%d" % (jsonu, int(page)) res = json.loads( urllib.request.urlopen(jsonu).read().decode('utf- 8') ) Through the looking glass: Reading JSON
  14. @app.route("/search") def search(): # ... # Read JSON … now

    render it template = env.get_template('results.html') return(template.render( terms=res['query'].replace('+', ' '), results=res, request=flask.request )) And back again: rendering
  15. And the payoff...

  16. Further information • Demonstration application: ◦ Gitorious repository / Github

    • Internet Archive harvester project: ◦ • PostgreSQL full-text search: ◦ PostgreSQL official documentation ◦ Seek and ye shall find: PostgreSQL full-tex search (presentation at PostgresOpen 2012) • Flask official docs • Jinja2 official docs