The Artful Business of Data Mining: Distributed Schema-less Document-Based Databases

The Artful Business of Data Mining: Distributed Schema-less Document-Based Databases

Data comes in all forms and shapes. Data also evolves as life and people adapt to new situations, and so should your database.

When working with data, traditional relational database systems come to mind because it is how most of us have been trained.

However, data is rarely homogeneous, and your database should not force you into a certain schema if your data is not relational.

During this talk we analyse the composition of "documents" in the context of a document-based database, and cover the basic principles of Map-Reduce and its potential use in the context of computational statistics.

What happens when the amount of data you have no longer fits on 1 server? How easy is it for your favourite database to currently expand and adapt to your new growing requirements? What is your contingency plan if your server goes down?

We go over some of the features that CouchDB, Riak provide you with, alongside some of David's personal opinions.

This is an intermediary talk. Listeners should have a working concept of Bayesian statistics, standard internet protocols as such as HTTP, and a minimum understanding of programming languages as such as JavaScript and Erlang.


David Coallier

March 27, 2013