Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to Databases

Intro to Databases

thegrubbsian

March 01, 2012
Tweet

More Decks by thegrubbsian

Other Decks in Programming

Transcript

  1. database |ˈdatəәˌbās, ˈdā-| noun a structured set of data held

    in a computer, esp. one that is accessible in various ways. Thursday, March 1, 12
  2. Most databases fall into one of five categories: relational, graph,

    key-value, document, and column family. Thursday, March 1, 12
  3. Relational Databases PostgreSQL MySQL SQL Server Oracle SQLite Database model

    built around relational theory (first-order predicate calculus) developed in 1969 by Edgar F. Codd. Schema is defined in tables of columns and rows. Data is related through matching keys across tables. Thursday, March 1, 12
  4. Graph Databases Neo4j Filament InfiniteGraph VertexDB Based on graph theory.

    These databases use nodes, properties, and edges to describe a “web” of data. They can be particularly powerful when doing complex ancestral queries. Thursday, March 1, 12
  5. Key-Value Databases Redis Memcached Tokyo Cabinet Riak A lose term

    for a group of databases whose primary way of retrieving objects is by a single key. What can be stored as a value widely varies. Thursday, March 1, 12
  6. Document Databases MongoDB CouchDB RavenDB Databases designed for storing document-oriented,

    or semi- structured data. Normally with a lose schema and the ability to store and retrieve nested structures. Thursday, March 1, 12
  7. Column-Family Databases LucidDB FluidDB Greenplum A lose term for a

    group of databases whose primary way of retrieving objects is by a single key. What can be stored as a value widely varies. Thursday, March 1, 12
  8. The choice of data store for your application can be

    a complex one. Some factors you should consider: Thursday, March 1, 12
  9. Some databases will have a way to store a sub-set

    of data in an ordered, easily traversable format, that serves as a more efficient lookup mechanism for the “actual” data. Indexes (Indices) Thursday, March 1, 12
  10. How do we ask a particular database for some information?

    Do we use a template object, a query language like SQL, or a simple string key? How we’ll query the data is often a good indicator of what kind of database you need. Querying Thursday, March 1, 12
  11. Generally, you can scale up or out. Scaling up means

    hosting your database on hardware with faster processors, more memory, and higher network throughput. Scaling out means distributing data, indices, and queries across many machines and aggregating the results. Thinking about scaling early can save huge pain down the line. Scaling Thursday, March 1, 12
  12. How you model your data has a lot of implications

    in selecting a database. Is your schema fixed or flexible? Do you related many disparate things, or are clear hierarchies present in your data. Do you have a lot of small things, or a few large things? etc... Modeling Thursday, March 1, 12
  13. In almost all modern applications you’ll want to use some

    kind of wrapper to interact with your database. It could be ActiveRecord for relational database, Mongoid for MongoDB, or the Redis ruby driver. How you want to interact with the database plays into which ORM you’ll want to use. Mapping (ORM) Thursday, March 1, 12
  14. At some point in every project, you need to perform

    a complex operation with your data. You need to aggregate, sum, map, or project your data and often the facilities provided by your database are the first place to look for a solution. Analyzing Thursday, March 1, 12
  15. The choice of database is kinda useless if you lose

    the data. It’s essential to understand the backup and recovery needs of your application and how different database do, or do not assist with that process. Recovering Thursday, March 1, 12
  16. optimization Universal careful indexing duplication Relational de-normalization pivoting materialized views

    Non-Relational nested relationships parallel queries The database, in most modern web applications, is almost always the first bottleneck for performance. Thursday, March 1, 12