Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Lanyrd Works

Simon Willison
November 02, 2011

How Lanyrd Works

Presented at DJUGL, 2nd November 2011.

Simon Willison

November 02, 2011
Tweet

More Decks by Simon Willison

Other Decks in Technology

Transcript

  1. Lanyrd.com Social event recommendation Comprehensive speaker profiles Archive of slides,

    notes and video Definitive database of professional events and speakers
  2. • Aug 31st, 11:22: Launch! (1 linode) • Aug 31st,

    12:41: Unlaunch • Aug 31st, 12:54: Read only mode • Aug 31st, 14:15: DB server (2 linodes) • Sep 4th: TechCrunched (read only :( ) • Sep 5th: 3 large EC2 + 1 RDS
  3. • Dec 8: Calacanis + Scoble at the same time!

    • Upgrade to next size of RDS • (Sometimes scaling vertically does the job)
  4. # Original implementation twitter_ids = [11134, 223455, 33221, ...] #

    fetch from Twitter attendees = Attendee.objects.filter( user__t_id__in = twitter_ids ).filter( conference__start_date__gte = datetime.date.today() )
  5. # Current implementation twitter_ids = [11134, 223455, 33221, ...] #

    fetch from Twitter sqs = SearchQuerySet() sqs = sqs.models(Conference) or_string = ' OR '.join(twitter_ids) sqs = sqs.narrow('attendees:(%s)' % or_string)
  6. Load balancer (nginx) HTTP cache (varnish) lanyrd.com badges.lanyrd.net app server

    (django/mod_wsgi) app server (django/mod_wsgi) app server (django/mod_wsgi) search master (solr) search slave (solr) search slave (solr) Database (MySQL RDS) Redis (data structures + message queue) worker (celery) worker (celery) logging (MongoDB)
  7. Main Wiki apache > lucene > solr Search the site

    with Solr Search Powered by Lucid Imagination Last Published: Sat, 04 Jun 2011 12:23:42 GMT Welcome to Solr What Is Solr? Get Started News May 2011 - Solr 3.2 Released March 2011 - Solr 3.1 Released 25 June 2010 - Solr 1.4.1 Released 7 May 2010 - Apache Lucene Eurocon 2010 Coming to Prague May 18-21 10 November 2009 - Solr 1.4 Released 20 August 2009 - Solr's first book is published! 18 August 2009 - Lucene at US ApacheCon 09 February 2009 - Lucene at ApacheCon Europe 2009 in Amsterdam 19 December 2008 - Solr Logo Contest Results 03 October 2008 - Solr Logo Contest 15 September 2008 - Solr 1.3.0 Available 28 August 2008 - Lucene/Solr at ApacheCon New Orleans 03 September 2007 - Lucene at ApacheCon Atlanta 06 June 2007: Release 1.2 available 17 January 2007: Solr graduates from Incubator 22 December 2006: Release 1.1.0 available 15 August 2006: Solr at ApacheCon US 21 April 2006: Solr at ApacheCon 21 February 2006: nightly builds 17 January 2006: Solr Joins Apache Incubator What Is Solr? PDF About Welcome Who We Are Documentation Resources Related Projects
  8. Find the needle you're looking for. Download Documentation Search doesn't

    have to be hard. Haystack lets you write your search code once and choose the search engine you want it to run on. With a familiar API that should make any Djangonaut feel right at home and an architecture that allows you to swap things in and out as you need to, it's how search ought to be. Haystack is BSD licensed, plays nicely with third-party app without needing to modify the source and supports Solr, Whoosh and Xapian . Get started 1. Get the most recent source. 2. Add haystack to your INSTALLED_APPS. 3. Create search_indexes.py files for your models. 4. Setup the main SearchIndex via autodiscover. 5. Include haystack.urls to your URLconf. 6. Search! Sprinting to 1.1-final Posted on 2010/11/16 by Daniel Though this site has sat out of date, there has been a lot of work put into Haystack 1.1. As of writing, there are eight issues blocking the release. I aim to have those down to zero by the end of the week. Once those eight are done, I will be releasing 1.1-final. The RC process really didn't do much last time and this release has been a long time in coming. This release will feature: Vastly improved faceting Whoosh 1.X support! Document & field boost support More Like This Faceting Stored (non-indexed) fields Highlighting Spelling Suggestions Boost
  9. add a conference add a conference you are signed in

    as simonw, do you want to sign out? calendar calendar conferences conferences coverage coverage profile profile search search EVENT TIME SPEAKERS EVENT TIME SPEAKERS EVENT TIME SPEAKERS Your current filters are… TYPE: Sessions TOPIC: NoSQL PLACE: United States Clear all filters NoSQL and Django Panel DjangoCon US 2010 9th September 2010 09:00-10:00 Jacob Burch Step Away From That Database DjangoCon US 2010 8th September 2010 11:20-12:00 Andrew Godwin Apache Cassandra in Action Strata 2011 1st February 2011 13:30-17:00 Jonathan Ellis FILTER BY type FILTER BY topic NoSQL 3 Django 2 Cassandra 1 FILTER BY place United States 3 Multnomah 2 Oregon 2 Portland 2 Santa Clara 1 California 1 Search Search We found 3 results for “django” django Search Search Sessions 3
  10. add a conference add a conference you are signed in

    as simonw, do you want to sign out? calendar calendar conferences conferences coverage coverage profile profile search search TODAY We've found 182 conferences your Twitter contacts are interested in. From our blog Welcoming Sophie Barrett to team Lanyrd Today we have a very special announcement (and for once, it's not a new feature!) We would like to welcome the super-wonderful Sophie Barrett to the Lanyrd team. Session schedules in your calendar You can now subscribe to event schedules in your calendar of choice. Stay up to date at the event with the schedule in the pocket where you need it. Venues (and venue maps) Your contacts' calendar Your contacts' calendar yours 24 contacts 182 Astronomy Science Café Scientifique: Exploring the dark side of star formation with the Herschel Space Observatory United Kingdom / Brighton 21st June 2011 4 contacts tracking 21 Attend Track Usability User Experience Usability Professionals' Association – International Conference United States / Atlanta 21st–24th June 2011 1 contact speaking and 3 contacts tracking 21 Attend Track Simon Willison Your profile page
  11. Dirty re-indexing trick class Article(models.Model): needs_indexing = models.BooleanField( default =

    True, db_index = True ) ... def save(self, *args, **kwargs): self.needs_indexing = True super(Article, self).save(*args, **kwargs)
  12. nginx + Solr replication trick upstream solrmaster { server 10.68.43.214:8080;

    } upstream solrslaves { server 10.68.43.214:8080; server 10.193.138.80:8080; server 10.204.143.106:8080; } server { listen 8983; location /solr/update { proxy_pass http://solrmaster; } location /solr/select { proxy_pass http://solrslaves; } }
  13. Try it Ready for a test drive? Check this interactive

    interactive tutorial tutorial that will walk you through the most important features of Redis. Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings strings, hashes hashes, lists lists, sets sets and sorted sorted sets sets. Learn more Learn more → → Download it Redis 2.2.10 is the latest stable version. Redis 2.2.10 is the latest stable version. Interested in legacy or unstable versions? Check the downloads page. Check the downloads page. What people are saying More... More... Comparison of CouchDB, Redis, MongoDB, Casandra, Neo4J & others http://j.mp/l32SqM http://j.mp/l32SqM via @DZone @__NeverGiveup Oh YAY, oui tu me redis ! *-* Hm, on s'rejoint à 14h au bahut ? :o JE L REDIS JE FOLLOW BACK SUR @Fuckement_TL une question : "How to use ServiceStack Redis in a web application to take advantage of pub / sub paradigm" http://t.co/EOgyLU1 http://t.co/EOgyLU1 #redis #web Nice - Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison http://bit.ly/l32SqM http://bit.ly/l32SqM from @kkovacs This website is open source software developed by Citrusbyte. The Redis logo was designed by Carlos Prioglio. Sponsored by Commands Clients Documentation Community Download Issues
  14. Lanyrd.com add a conference add a conference you are signed

    in as simonw, do you want to sign out? calendar calendar conferences conferences coverage coverage profile profile search search JUNE 2011 Florence in Italy EuroPython 2011 EuroPython 2011 The European Python Conference You're speaking AT THIS EVENT (short URL) 119 speakers 97 80 PEOPLE attending PEOPLE tracking TELL YOUR FRIENDS! Tweet about this event Topics Django Plone Pyramid Python Twisted 19–26 http://ep2011.europython.eu/ View the schedule on Lanyrd Save to iCal / iPhone / Outlook / GCal @europython #europython lanyrd.com/ccdpc Andreas Schreiber @onyame Andrew Godwin @andrewgodwin Andrii Mishkovskyi @mishok13 Armin Alan Franzoni @franzeur Alessandro Dentella Alex Martelli Ali Afshar Anna Ravenscroft Anselm Kruis Antonio Cuni @antocuni Armin Rigo Edit topics
  15. Distributed Task Queue Celery is an asynchronous task queue/job queue

    based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready). Celery is used in production systems to process millions of tasks a day. Celery is written in Python, but the protocol can be implemented in any language. It can also operate with other languages using webhooks. The recommended message broker is RabbitMQ, but limited support for Redis, Beanstalk, MongoDB, CouchDB, and databases (using SQLAlchemy or the Django ORM) is also available. Celery is easy to integrate with Django, Pylons and Flask, using the django-celery, celery-pylons and Flask-Celery add-on packages. Example This is a simple task adding two numbers: Celery 2.2 released! By @asksol on 2011-02-01. A great number of new features, including Jython, eventlet and gevent support. Everything is detailed in the Changelog, which you should have read before upgrading. Users of Django must also upgrade to django-celery 2.2. This release would not have been possible without the help of contributors and users, so thank you, and congratulations! Celery 2.1.1 bugfix release By @asksol on 2010-10-14. All users are urged to upgrade. For a list of changes see the Changelog. Users of Django must also upgrade to django-celery 2.1.1. Background Processing Background Processing Distributed Distributed Asynchronous/Synchronous Asynchronous/Synchronous Concurrency Concurrency Periodic Tasks Periodic Tasks Retries Retries Home Code Documentation Community Download
  16. Tasks? • Anything that takes more than about 200ms •

    Updating a search index • Resizing images • Hitting external APIs • Generating reports
  17. Python and MongoDB Python and MongoDB tutorial tutorial A session

    at EuroPython 2011 MongoDB is the new star of the so-called NoSQL databases. Using Python with MongoDB is the next logical step after having used Python for years with relational databases. This talk will give an introduction into MongoDB and demonstrate how MongoDB can be be used from Python. More information can be found under: http://www.zopyx.com/resources/python-mongodb-tutorial-at... More sessions at EuroPython 2011 on Python Add coverage to this session A URL to coverage such as videos, slides, podcasts, handouts, sketchnotes, photos etc. Add Add EuroPython 2011 Italy / Florence 19th–26th June 2011 TELL YOUR FRIENDS! Tweet about this session WHEN Time 14:30–18:30 CET Date 20th June 2011 SESSION HASH TAG #sftzh SHORT URL lanyrd.com/sftzh OFFICIAL SESSION PAGE ep2011.europython.eu/conf Topics MongoDB Python SCHEDULE INCOMPLETE? Add another session Andreas Jung CEO, ZOPYX Ltd View the schedule Edit topics http://www.slideshare.net/ajung/python-mo
  18. Link Write-up Slides Video Audio Sketch notes Transcript Handout Liveblog

    Photos Notes Link title Python mongo db-training-europython-2011 Type of coverage Coverage preview From SlideShare: EuroPython 2011 Italy / Florence 19th–26th June 2011 Add coverage Add coverage http://www.slideshare.net/ajung/python-mongo- dbtrainingeurop... Python and MongoDB tutorial
  19. The task itself... • Check for special cases • Tries

    using http://embed.ly/ to find a preview • Fetches the HTTP headers and first 2048 bytes • If HTML, attempts to extract the <title> • If other, gets the file type and size from headers
  20. add a conference add a conference you are signed in

    as simonw, do you want to sign out? calendar calendar conferences conferences coverage coverage profile profile search search ON NOW Django Plone Pyramid Python Twisted EuroPython 2011 Italy / Florence 19th–26th June 2011 SEPTEMBER 2011 Django Open Source Python Django Python DjangoCon US 2011 United States / Portland 6th–8th September 2011 PyCON FR 2011 France / Rennes 17th–18th September 2011 OCTOBER PyCon DE 2011 Django events looking for participants 1 Django event is looking for participants Django coverage By country Ireland 1 Django conferences Django conferences 19 6 17 4 52 videos Most recent added 3 weeks ago 52 slide decks Most recent added 4 hours ago 3 audio clips Most recent added 1 week ago 27 write-ups Most recent added 1 week ago 11 handouts Most recent added 18 hours ago 3 notes Most recent added 10 hours ago
  21. class Conference(models.Model): ... cache_version = models.IntegerField(default = 0) def save(self,

    *args, **kwargs): self.cache_version += 1 super(Conference, self).save(*args, **kwargs) def touch(self): Conference.objects.filter(pk = self.pk).update( cache_version = F('cache_version') + 1 )
  22. {% cache 36000 conf-topics conference.pk conference.cache_version %} <ul class="tags inline-tags

    meta"> {% for topic in conference.topics.all %} <li><a href="{{ topic.get_absolute_url }}">{{ topic }}</a></li> {% endfor %} </ul> {% endcache %}
  23. UserBasedExceptionMiddleware from django.views.debug import technical_500_response import sys class UserBasedExceptionMiddleware(object): def

    process_exception(self, request, exception): if request.user.is_superuser: return technical_500_response(request, *sys.exc_info())
  24. mysql-proxy • Very handy lua-customisable proxy for all of your

    MySQL traffic • Worst documented software ever • log.lua - logs out ALL queries • https://gist.github.com/1039751
  25. django_instrumented • (Unreleased) code I wrote for Lanyrd • Collects

    various runtime stats about the current request, stashes a profile JSON in memcached • Writes out the profile UUID as part of the HTML • A bookmarklet to view the profile
  26. mongodb logging • Super-fast inserts, log everything! • Capped collections

    • Structured queries • Most useful query: show me Django views with slowest average response time
  27. For the future... • Much better profiling, monitoring and alerts

    • Varnish in front of everything • Replicated MySQL for analytics + upgrades