Save 37% off PRO during our Black Friday Sale! »

A Flask of full-text search with PostgreSQL [Dan Scott]

A Flask of full-text search with PostgreSQL [Dan Scott]

Building a web application does not have to be hard! This talk walks through a real web application, in under 200 lines of code, that uses Flask for the web and PostgreSQL for its fast, built in text search support.

3b085ba94fee217d7656971b0cb4cf00?s=128

PyCon Canada

August 10, 2013
Tweet

Transcript

  1. A Flask of full-text search with PostgreSQL Dan Scott, Laurentian

    University http://coffeecode.net PyCon Canada 2013
  2. Perl by day Python at night!

  3. Demo application: http://gitorious.org/~dbs “PostgreSQL full text search engine”

  4. Problem: Digitized student newspapers need to be found

  5. Solution: Python! ... and PostgreSQL, a rock-solid relational database

  6. CREATE TABLE fulltext_search ( id SERIAL, doc TEXT, tsv TSVECTOR);

    CREATE TRIGGER tsvupdate BEFORE INSERT OR UPDATE ON fulltext_search FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger (tsv, 'pg_catalog.english', doc); CREATE INDEX fts_idx ON fulltext_search USING GIN (tsv); INSERT INTO fulltext_search (doc) VALUES ('Sketching the trees'), ('Found in schema.org'); PostgreSQL search schema
  7. • In PostgreSQL, each TEXT document ◦ is parsed into

    tokens (words, numbers, paths, email addresses, white space) ◦ each token is then normalized with dictionaries (case-folding, stemming, thesauri, stop words) ◦ to create a text-search vector (TSVECTOR) SELECT doc, tsv FROM items; doc | tsv ---------------------+--------------------- Sketching the trees | 'sketch':1 'tree':3 Full-text normalization
  8. Full-text search as “just another SQL query”

  9. WITH q AS ( SELECT to_tsquery('sketch & trees') AS query),

    ranked AS ( SELECT id, doc, ts_rank_cd(tsv, query) AS rank FROM fulltext_search, query WHERE q.query @@ tsv ORDER BY rank DESC LIMIT 10 ) SELECT id, ts_headline(doc, q.query) FROM ranked, q ORDER BY ranked DESC; id | ts_headline -----+------------------------------------ 5 | <b>Sketching</b> the <b>trees</b> 2 | <b>sketches</b>, water colours, and 19 | <b>tree</b>. What the man did was
  10. Search app architecture PostgreSQL REST - Flask Harvester Search requests

    Web - Flask JSON
  11. Why Flask? • Routes via decorators! @app.route("/search/<query>/") @app.route("/search/<query>/<int:page>") @app.route("/search/<query>/<int:page>/<int:limit>") def

    search(query, page=0, limit=10): "Return JSON formatted search results" • Great documentation • And solid Unicode support ◦ So requests are as simple as: curl http://localhost/search/pierre+trudeau curl http://localhost/search/pierre+trudeau/1/20 curl http://localhost/search/Montréal
  12. resj = json.dumps({ 'query': query, 'results': results, 'meta': { 'total':

    count, 'page': page, 'limit': limit, 'results': len(results) } }) return flask.Response( response=str(resj), mimetype='application/json' ) Down the rabbit hole: Returning JSON
  13. @app.route("/search") def search(): """Simple search for terms, with limit and

    paging""" query = flask.request.args.get('query', '') page = flask.request.args.get('page', '') jsonu = u"%s/search/%s/" % (JSON_HOST, urllib.parse.quote_plus(query.encode('utf-8')) ) if page: jsonu = u"%s%d" % (jsonu, int(page)) res = json.loads( urllib.request.urlopen(jsonu).read().decode('utf- 8') ) Through the looking glass: Reading JSON
  14. @app.route("/search") def search(): # ... # Read JSON … now

    render it template = env.get_template('results.html') return(template.render( terms=res['query'].replace('+', ' '), results=res, request=flask.request )) And back again: rendering
  15. And the payoff...

  16. Further information • Demonstration application: ◦ Gitorious repository / Github

    • Internet Archive harvester project: ◦ https://gitorious.org/ia-harvester • PostgreSQL full-text search: ◦ PostgreSQL official documentation ◦ Seek and ye shall find: PostgreSQL full-tex search (presentation at PostgresOpen 2012) • Flask official docs • Jinja2 official docs