your relational database, kept as normalized as possible • Denormalize all relevant data to a separate search index • Invest a lot of effort in synchronizing the two • Smartly route queries to database or search depending on tolerance for lag • Optional: query search engine for object IDs, then load directly from the database to display the content
Interface is all JSON over HTTP - easy to use from any language • Claims to be “real-time” - it’s close enough • Insanely powerful query language (a JSON DSL) • Strong focus on analytics in addition to text search • Elastic means elastic: highly horizontally scalable
"content": "=============================\nDjango documentation contents \n=============================\n\n.. toctree::\n :hidden:\n\n index\n\n.. toctree:: \n :maxdepth: 3\n\n intro/index\n topics/index\n howto/index\n faq/index\n ref/ index\n misc/index\n glossary\n releases/index\n internals/index\n\nIndices, glossary and tables\n============================\n\n* :ref:`genindex`\n* :ref:`modindex`\n* :doc:`glossary` \n", "top_folder": "contents.txt", "path": "contents.txt", "id": "de72ca631bca86f405aa301b9ee8590a4cf4e7c8"} {"index": {"_id": "2633212db84c83b86479856e6f34494b3433a66a"}} {"title": "Glossary", "url": "https://docs.djangoproject.com/en/1.10/glossary/", "content": "========\nGlossary\n========\n\n.. glossary::\n\n concrete model\n A non-abstract (:attr:`abstract=False\n <django.db.models.Options.abstract>`) model.\n\n field\n An attribute on a :term:`model`; a given field usually maps directly to\n a single database column.\n\n See :doc:`/topics/db/models`.\n\n generic view\n A higher- order :term:`view` function that provides an abstract/generic\n implementation of a common idiom or pattern found in view development.\n\n See :doc:`/topics/class-based-views/index`.\n \n model\n Models store your application's data.\n\n See :doc:`/topics/db/models`. \n\n MTV\n \"Model-template-view\"; a software pattern, similar in style to MVC, but\n a better description of the way Django does things.\n\n See :ref:`the FAQ entry <faq-mtv>`.\n \n MVC\n `Model-view-controller`__; a software pattern. Django :ref:`follows MVC\n to some extent <faq-mtv>`.\n\n __ https://en.wikipedia.org/wiki/Model-view-controller\n\n project\n A Python package -- i.e. a directory of code -- that contains all the\n settings for an instance of Django. This would include database\n configuration, Django- specific options and application-specific\n settings.\n\n property\n Also known as \"managed attributes\", and a feature of Python since\n version 2.2. This is a neat way to implement attributes whose usage\n resembles attribute access, but whose implementation uses method calls.\n\n See :class:`property`.\n\n queryset\n An object representing some set of rows to be fetched from the database.\n\n See :doc:`/topics/db/queries`.\n\n slug\n A short label for something, containing only letters, numbers,\n underscores or hyphens. They're generally used in URLs. For\n example, in a typical blog entry URL:\n\n .. parsed-literal::\n\n https://www.djangoproject.com/weblog/2008/apr/12/**spring**/\n\n the last bit (``spring``) is the slug.\n\n template\n A chunk of text that acts as formatting for representing data. A\n template helps to abstract the presentation of data from the data\n itself.\n\n See :doc:`/topics/templates`.\n\n view\n A function responsible for rendering a page.\n", "top_folder": "glossary.txt", "path": "glossary.txt", "id": "2633212db84c83b86479856e6f34494b3433a66a"}
once) Package.init() # Save a package to the index Package( meta={ 'id': data['info']['name'] }, name=data['info']['name'], summary=data['info']['summary'], description=data['info']['description'], keywords=data['info']['description'], classifiers=data['info']['classifiers'], ).save()
… needs_indexing = models.BooleanField(default=True, db_index=True) # Reindex all conferences when associated guide is edited: guide.conferences.all().update(needs_indexing=True)
… last_touched = models.DateTimeField( db_index=True, default=datetime.datetime.utcnow, ) # Reindex all conferences when associated guide is edited: guide.conferences.all().update(last_touched=datetime.datetime.utcnow()) Indexing code needs to track most recently seen last_touched date time
Dilithium, which subscribes to the MySQL replication log and writes interesting moments (e.g. order.updated) to Kafka • Re-indexing code subscribes to Kafka • github.com/noplay/python-mysql-replication
or my-friend-2 or my-friend-3 or … • Find events similar to my-last-10-saved-events • Search engines are great at scoring! Boost by in-same-city-as-me, boost more by saved-by- my-friends
for log analysis - can easily handle enormous amounts of traffic • Feed user actions into a custom index - search.executed, user.followed etc • Can then write application logic that varies depending on recent user activity
large datasets • Create “characteristics” for your users - e.g. uses_linkedin, signed_up_in_2015, referred_by_a_friend • Use Kibana to explorer interesting relationships
set intersections: the set of documents containing “dogs” with the set of documents containing “skateboarding” • This is very distributable: query a dozen shards, then merge and return the results • Relevance is a first-class concept • map/reduce in real-time (unlike Hadoop)