Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Carpentry SQL Introduction

Data Carpentry SQL Introduction

Christina Koch

August 27, 2018

More Decks by Christina Koch

Other Decks in Research


  1. Questions •  Look at the research question on your table.

    •  Looking at surveys.csv, plots.csv and species.csv, what would you need to do to answer your question? •  Report back in 4-5 minutes
  2. Answers (sort of) To answer our research questions, we need

    to: •  select subsets of the data (rows and columns) •  group subsets of data •  do math and other calculations •  combine data across spreadsheets
  3. Goals •  Extract and manipulate data to answer our research

    questions •  Once data grows beyond ~50 rows, it is challenging to manipulate “by hand” so we want to use tools that are: – scalable (grow with our data) – reproducible (we can repeat them) – accuracy-enabling (reduce human error)
  4. Our Tool: Databases* A relational database stores data in relations

    made up of records with fields. The relations are usually represented as tables. *will use R tomorrow to do some of the same tasks
  5. To use a metaphor... From seeing what you get to

    “ordering” it with a query.
  6. Why SQL? •  For medium to large datasets, SQL can

    be the most efficient way to store and query data •  Good format for collaborative data gathering •  In some disciplines, data is commonly stored in databases; useful to be able to access
  7. Why SQL? •  Demystify databases! •  Good introduction to: – tabular

    data thinking •  select + filter •  split-apply-combine – using a “language” to manipulate data – what we learn today will be revisited in the R lesson tomorrow
  8. Like eating your vegetables... SQL provides a healthy foundation for

    using other tools, and in certain circumstances is delicious on its own.