Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FISH 6003: Week 1 - Introduction and the philosophy of statistics

FISH 6003: Week 1 - Introduction and the philosophy of statistics

FISH 6003 Week 1, Part 1

MI Fisheries Science

January 15, 2018
Tweet

More Decks by MI Fisheries Science

Other Decks in Science

Transcript

  1. Week 1: Introduction, and the Philosophy of Statistics CatchRate ~

    Poisson (μ ij ) E(CatchRate) = μ ij Log(μ ij ) = GearType ij + Temperature ij + FleetDeployment i FleetDeployment i ~ N(0, σ2) Using lme4: m <- glmer(CatchRate ~ GearType + Temperature + (1 | FleetDeployment), family = poisson) FISH 6003 FISH 6000: Science Communication for Fisheries Brett Favaro 2017 This work is licensed under a Creative Commons Attribution 4.0 International License
  2. Land Acknowledgment We would like to respectfully acknowledge the territory

    in which we gather as the ancestral homelands of the Beothuk, and the island of Newfoundland as the ancestral homelands of the Mi’kmaq and Beothuk. We would also like to recognize the Inuit of Nunatsiavut and NunatuKavut and the Innu of Nitassinan, and their ancestors, as the original people of Labrador. We strive for respectful partnerships with all the peoples of this province as we search for collective healing and true reconciliation and honour this beautiful land together. http://www.mun.ca/aboriginal_affairs/
  3. This week: • Introduction to the course • Introduction to

    assignments • What are statistics? • Working with an R project – and boxplots • Defining terms • Statistical thinking
  4. Dr. Brett Favaro (me) My work: 1: Improving fishing gear

    2: Science-based conservation advocacy 3: Canadian conservation policy 4: Reforming the Scientific Enterprise
  5. This is an inclusive classroom. Everyone has a right to

    be here. Math, stats, etc. is hard. If you have a question – ask it! If something doesn’t make sense, ask! This is the first iteration of this course – there will be bugs! Please report them
  6. Each box involves many steps and skills! Design study Apply

    for $ Plan study & prepare Collect data Analyze data Write paper Submit paper for publication Revise, resubmit as needed Revise scope or try again Paper is released Post-publication review Disseminate & communicate results Gaps identified
  7. Pose research question Apply for $ Plan study & prepare

    Collect data Analyze data Write paper Submit paper for publication Revise, resubmit as needed Revise scope or try again Paper is released Post-publication review Disseminate & communicate results Gaps identified Steps in blue are mostly communication (FISH 6000) Steps in red are mostly data skills (FISH 6002) Steps in green are largely statistics (FISH 6003) Steps in gold depend on breadth of knowledge (FISH 6001)
  8. Design study Apply for $ Plan study & prepare Collect

    data Analyze data Write paper Submit paper for publication Revise, resubmit as needed Revise scope or try again Paper is released Post-publication review Disseminate & communicate results Gaps identified Your goal: Get from here… …to here… …and do this both inside and outside of academia… …to 1) make the world a better place 2) get a degree, and then a job
  9. Unless you really need an answer in writing… For course-related

    questions: 1. Check the syllabus 2. Ask in class, or discuss with colleagues 3. Ask on Teams (so everyone can benefit from an answer) 4. Request a meeting with me by email (Subject: FISH 6003: Meeting Request) Use your MI outlook calendar to see my availability, and propose 3 times. Thurs AM 9 AM – 12 PM preferred.
  10. Introduce Microsoft Teams All Marine Institute students have access to

    Teams by default. Non-MI students need to request access. It is essential that everyone have Teams access.
  11. Course structure - A one-hour and two-hour class meeting every

    week. Roughly: - M (1 hr): Theory - T (2 hr): Practice, with worked examples - Mixture of lecture, activities, and open discussion - My promise: No busywork Course website: https://mifisheriesscience.github.io/courses/6003Stats Slides delivered via Speakerdeck, linked through course website
  12. Why FISH 6003? • Quantitative data analysis is essential in

    the life sciences • Ecology, fisheries, etc. are harder than many other fields because the natural world is complex • Ecology literature is rife with statistical error. Much of it doesn’t matter. Some of it does. We want to be reproducible so errors can be caught early. • You cannot design studies properly without knowing how the data will be analyzed. Studies may be underpowered, etc. And finally… • Ecological systems violate many assumptions of common statistical models. So our stats are hard!
  13. My training • I’m not a statistician. I’m a conservation

    biologist with statistical training • Alain Zuur and Elena Ieno’s Highland Statistics courses • Dolph Schluter BIOL 501 (UBC) • Stats Beerz / Stats Therapy • But mostly: Reading lots! • This is the third offering of FISH 6003. There will be errors. Please point them out to me! Let’s learn together
  14. My approach: Focus on applying fundamentals • This course will

    be application-focused. Goal is to learn enough to know when and how to use various statistical techniques common in experimental ecological • We will spend the vast majority of our time on regression-type analyses (i.e. evaluating association between X and Y) because it underpins most of experimental science • It is almost certain that during your degree you will need to go in more detail on each topic than we will have time to cover • Course will be broad, less deep than some. Logic; If you’re aware of a technique, even without fully ‘getting’ it, you will be able to self- teach. Focus on making good decisions, and documenting them
  15. Normally, students will take FISH 6002 before this class But,

    6002 is not a formal pre-requisite – 6002 goes further than is needed for 6003. And I like to be flexible. If you have not done 6002, I recommend reviewing: https://mifisheriesscience.github.io/courses/6002Data/ Essential: Weeks 1-5 Helpful: Weeks 7-9 You should be familiar with R basics. I also won’t spend much time covering data management, how to code variables, etc.
  16. Complexity Basic Complex Moderate While basic stats and graphs can

    be made in Excel, SPSS, etc. you will hit a wall Most fisheries research occurs beyond this point With R, the wall is your skillfulness, not the software environment
  17. RStudio is an integrated development environment (IDE) for R Basic:

    It makes your code easier to read Advanced: It adds some advanced features (projects, R Markdown, and a few other things) https://www.rstudio.com/ RStudio extends the functionality of R, and makes it easier to use
  18. FISH6003_Week1_Intro | |- FISH6003_Week1_Intro.Rproj # Rproj file – organizes other

    files | |- Description.txt # metadata | |- data/ # raw data, not changed once created | +- Week1Data.csv # List each file | |- R/ # Folder for re-usable code | +- FISH6003Functions.R # Script containing any functions used | |- analysis/ # Code pertaining to analysis | +- 001_DataSetup.R | +_ 002_Week1Plots.R | |- plots/ # Publication-quality plots are saved here (if any) | |- manuscript/ # Manuscript files | +- Example.Rmd R projects – Make sharing easier, and ++ reproducibility
  19. Course assignments • 10% - Participation, in-class activities • 30%

    - Minor assignments • 60% - Major assignment
  20. 30%: Minor Assignments A minor assignment should take less than

    one day to complete. Usually, there will be in-class time given. You will be given directions on these in class. Expect there to be three minor assignments, which together will be worth 30% of the course grade.
  21. 60%: Major Assignment Major Assignment In the Major Assignment, you

    will conduct a statistical analysis based on data you obtain. This will be a regression-type analysis, which we will cover extensively within this course (i.e. what is the impact of X on Y?) These data may be something that you have collected yourself for your research, from a public database, or as attached by another publication. Essentially, the major assignment is to do all the basic groundwork that would need to be done to prepare an analysis for publication.
  22. Part 1: Obtaining and Describing Data The goal of part

    1 is to obtain and describe data. Key questions, which will be answered in two pages or less (single-spaced, 12-point Times New Roman): •Where are the data from? •What are the data? (i.e. how many points? What are they measuring? How are they measured?) •If you got the data from someone else, are there any constraints around using them? •Articulate Three key research questions you might ask of these data. In other words: What are your biological hypotheses that you will be testing? The purpose of this section is to determine whether the data you obtained will be sufficient for the major project. Timeline and Submission Part 1 is due by the end of week 4
  23. LASTNAME_FISH6003_Major | |- LASTNAME_FISH6003_Major.Rproj | |- Description.txt # You're reading

    it. | |- data/ | +- # Your data file goes here | |- part2/ | +- LASTNAME_MajorPart2_Fillable.rmd # Template RMarkdown file for Part 2 | |- part3/ | +- LASTNAME_Major_Assignment_Part3_fillable.rmd # Template RMarkdown file for Part 3 | |- R/ | +- 6003Functions.R # Script containing a few convenience functions | |- analysis/ | +- 001_DataSetup.R # Code to produce tidy dataset | +- 002_Exploration.R # Data exploration goes here | +- 003_Analysis.R # Analysis code goes here | |- source_paper/ | +- # If data are from a paper, paste the paper here. | Download the R Project template from the course website
  24. Part 2: Data Exploration In Part 2, you will conduct

    a data exploration, following a slightly modified version of the procedure explained in Zuur et al. 2009. You should report the results of each step. These steps include: A. State a biological hypothesis B. Visualize the experimental design, with a diagram C. Conducting the data exploration: 1. Outliers Y and X 2. Homogeneity Y 3. Normality Y 4. Zeroes Y 5. Collinearity X 6. Relationships Y and X 7. Interactions 8. Independence of Y This will all make sense later
  25. To simplify process and keep the focus on stats rather

    than coding, I have prepared Markdown templates to complete for Parts 2 and 3
  26. Part 3: Conducting and Reporting the Analysis In Part 3,

    you will conduct a regression-type data analysis, following the reporting procedure explained in Zuur and Ieno, 2016. The steps to this assignment include: 1.Identify the dependency structure in the data 2.Present the statistical model 3.Fit the model 4.Validate the model 5.Interpret and present the numerical output of the model 6.Create a visual representation of the model 7.Simulate from the model Again, a Markdown template is provided
  27. Next: - Install R, R Studio, and something to open

    .zip files (e.g. Winzip) - Download the sample R project - Bring your laptop to class (please always do this)