Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FISH 6003: Week 4 Part 1 - Multiple Linear Regression

MI Fisheries Science
January 29, 2018
60

FISH 6003: Week 4 Part 1 - Multiple Linear Regression

Revised a few slides on 29 Jan

MI Fisheries Science

January 29, 2018
Tweet

Transcript

  1. Chapter 4: Multiple linear regression CatchRate ~ Poisson (μ ij

    ) E(CatchRate) = μ ij Log(μ ij ) = GearType ij + Temperature ij + FleetDeployment i FleetDeployment i ~ N(0, σ2) Using lme4: m <- glmer(CatchRate ~ GearType + Temperature + (1 | FleetDeployment), family = poisson) FISH 6003 Part 1 FISH 6003: Statistics and Study Design for Fisheries Brett Favaro 2017 This work is licensed under a Creative Commons Attribution 4.0 International License
  2. Land Acknowledgment We would like to respectfully acknowledge the territory

    in which we gather as the ancestral homelands of the Beothuk, and the island of Newfoundland as the ancestral homelands of the Mi’kmaq and Beothuk. We would also like to recognize the Inuit of Nunatsiavut and NunatuKavut and the Innu of Nitassinan, and their ancestors, as the original people of Labrador. We strive for respectful partnerships with all the peoples of this province as we search for collective healing and true reconciliation and honour this beautiful land together. http://www.mun.ca/aboriginal_affairs/
  3. This week: • Multiple linear regression • A.K.A. “General linear

    model” • Today’s Focus: Interpreting and specifying models. Not focusing on assessing fit whether the models are working. Y= βo + β1 X1 + β2 X2 … + error
  4. Quiz: For every 1 unit increase in TD we see

    a ___ increase in GSI Bivariate linear regression; GSI ~ TD
  5. Quiz: As we go from SuPallosu to TavoloaraMPA we see

    a ___ increase in GSI Bivariate linear regression; GSI ~ Site Modelled relationship Raw data AKA One-way ANOVA or T-test
  6. Multiple comparisons problem • Our threshold for ‘significance’ is P

    < 0.05. • If we repeated the study infinite times, we would erroneously reject Ho 5% of the time (i.e. we would conclude there IS an effect when one doesn’t exist, one time out of 20) • Every time you run a model you risk making this error • If you run multiple models, the risk of an error increases If both Site and TD matter, we should model them together. Why? Interpretability. Goal is not just to ask “is there an effect” – we want to measure magnitude of effect of each parameter. We want to see each variable’s contribution to changes in Y. Interaction. Covariates can impact Y on their own, or they can work in concert.
  7. Disclaimer: εi ~ N(0, σi 2) Assume error is normally

    distributed with mean of zero, variance σi 2 Assume σi 2 is equal across the entire range of data - violated Assume replicates are truly independent. Y values at a given X should not influence Y values at other X positions - violated Assume fixed X GSI ~ TD So we wouldn’t use this type of model normally. But we will for now…
  8. Test Diameter and Site Y= βo + β1 X1 +

    β2 X2 … + error GSIi = βo + β1 TDi + Sitei + εi εi ~ N(0, σ2) “For every 1-unit increase in TD, GSI increases by 0.108” “As we go from Su Pallosu to TavolaraMPA, GSI increases by 0.03 units”
  9. GSIi = β0 + β1 TDi + Sitei + εi

    GSIi = -2.26 + 0.108*TDi + 0 * SuPallosu + 0.0382 * TavolaraMPA + εi • When including a categorical covariate, by default, R makes all comparisons against the first alphabetical level in the factor • Here, “SuPallosu” comes before “TavolaraMPA” – so “SuPallosu” becomes default level • Implication: Control group should always be first level GSIi = β0 + β1 *TDi + 0 * SuPallosu + β2 * TavolaraMPA + εi
  10. With Month instead Y= βo + β1 X1 + β2

    X2 … + error GSIi = βo + β1 TDi + Monthi + εi εi ~ N(0, σ2) “For every 1-unit increase in TD, GSI increases by 0.108” “As we go from January to February, GSI increases by 0.785” “As we go from January to April, GSI decreases by 0.232” And so on…
  11. Colour = adjustment due to month GSIi = β0 +

    β1 *TDi + 0 * Month1 + β2 * Month2 + β3 * Month3 + β4 * Month4 + β5 * Month5 + β6 * Month6 + β7 * Month7 + β8 * Month8 + β9 * Month9 + β10 * Month10 + β11 * Month12 + εi Note: By default, R will plot straight lines to edge of plot Best practice is to constrain lines to within x range
  12. GSI at Month 1, TD = 0 Beware – every

    P value reports a comparison of Month 1 vs. Month2, 3, 4, etc. Multiple comparisons!! (Don’t trust these P values)
  13. What is the F value when the model excludes this

    term? If F is large, reject Ho that this term has no impact on model fit. I.e. P<0.05 – this variable has a significant effect on Y
  14. Contrast with our Site model “Site” overall has a non-

    significant effect “Month” overall has a significant effect
  15. Interactions • So far, we’ve shown how changes in a

    categorical variable bumps the line up and down • But what if we think the line is not only adjusted, but also has a different slope? • Consider Month * TD: GSIi = β0 + β1 TDi + Monthi + Month*TDi + εi
  16. At Month 1, TD = 0, intercept is -5.37 In

    Month 1, every additional TD brings 0.179 units of increase in GSI As you go from Month 1 to Month 2, you see an increase of 3.54 in the intercept for GSI, when TD = 0 As you go from Month 1 to Month 2, subtract 0.054 from the base slope. In Month 2, for every additional unit of TD, you are getting 0.054 fewer units of increase in GSI 0.179 – 0.054 = 0.125 In Month 2, for every additional unit of TD, you get 0.125 units of GSI
  17. Recap • When categorical covariates are in your model: •

    Coefficients from summary() give you: • Intercept - Mean of the first level, all else being zero • Beta1 – difference between first level mean and second level • Beta2 – difference between first level mean and third level • Use drop1() or anova() to determine if the categorical covariate has an overall significant effect on Y • Both continuous and categorical covariates can be placed on X • Interactions allow slope to vary. No interaction = slope is constant across groups • Best to practice writing out all models that you run
  18. More complicated: Y= βo + β1 X1 + β2 X2

    … + error GSIi = βo + β1 TDi + Sitei + Monthi + Month*Site + Month*TD + Month*Site*TD + εi εi ~ N(0, σ2) Month, Site, TD mod6 <- lm(GSI ~ TD * Month * Site, data=workingdata) mod6 <- lm(GSI ~ TD + Month + Site + Month*Site + TD * Month + TD*Month*Site, data=workingdata)
  19. They used categorical size, not TD This is similar to

    the approach they used in the actual paper
  20. This is essentially what we get from drop1: each of

    these P-values are assessing whether the significantly different levels of variance are explained by keeping the variable in the model. Low P-value = reject Ho that this variable has no effect on variance explained by model (They also used a random factor. We’ll talk about this later)
  21. Also: Think about this model: Month, Zone, Size Class, Area

    Restated as: Month, Area (MPA/No MPA), Size Class, Sampling Location (A, B, C, D) We know for sure there is are at least two problems with this model. You know enough by now to diagnose one of them. What is the problem?
  22. But: • Our model would be significantly worse without the

    three-way interaction • This implies that each term should remain in the model, with interactions!
  23. Recap. Must know: • Can mix categorical and continuous covariates

    in a model, with or without interactions • How to interpret betas – for both continuous and categorical covariates, as well as for interactions • Can build a model in R with one Y, and more than one covariate • Understand what Drop1 is doing Y= βo + β1 X1 + β2 X2 … + error εi ~ N(0, σ2)
  24. Exercise: • Run 001_UrchinSetup and 002_UrchinExploration (same as before) •

    Review and run 003_UrchinStats. This is code from TODAY’S lecture. • Note that we have not yet assessed whether any of these models are working (and we know that they probably aren’t). Next week. Tomorrow: • A big example, using a different study
  25. Chapter 4: Multiple linear regression CatchRate ~ Poisson (μ ij

    ) E(CatchRate) = μ ij Log(μ ij ) = GearType ij + Temperature ij + FleetDeployment i FleetDeployment i ~ N(0, σ2) Using lme4: m <- glmer(CatchRate ~ GearType + Temperature + (1 | FleetDeployment), family = poisson) FISH 6003 Part 2 FISH 6003: Statistics and Study Design for Fisheries Brett Favaro
  26. • Consequences of invasion: • Decreased prey fish abundance and

    density • Coral can become overgrown with algae as a result • Lionfish outcompete native predators • Solutions • Targeted fishing • Culls (bounties, derbies, etc)
  27. https://peerj.com/articles/3818.pdf The invasion of Indo-Pacific lionfish is one of the

    most pressing concerns in the context of coral reef conservation throughout the Caribbean. … In Roatan (Honduras) a local non-governmental organisation (i.e. Roatan Marine Park) trains residents and tourists in the use of spears to remove invasive lionfish. Here, we assess the effectiveness of local removal efforts in reducing lionfish populations. We ask whether reefs subject to relatively frequent removals support more diverse and abundant native fish assemblages compared to sites were no removals take place. Lionfish biomass, as well as density and diversity of native prey species were quantified on reefs subject to regular and no removal efforts. Reefs subject to regular lionfish removals (two to three removals month-1 ) with a mean catch per unit effort of 2.76 ± 1.72 lionfish fisher-1 h-1 had 95% lower lionfish biomass compared to nonremoval sites. Sites subject to lionfish removals supported 30% higher densities of native prey-sized fishes compared to sites subject to no removal efforts. We found no evidence that species richness and diversity of native fish communities differ between removal and non-removal sites. We conclude that opportunistic voluntary removals are an effective management intervention to reduce lionfish populations locally and might alleviate negative impacts of lionfish predation. We recommend that local management and the diving industry cooperate to cost-effectively extend the spatial scale at which removal regimes are currently sustained.
  28. • Six sites • Treated as three scuba transects per

    site • HOWEVER each “transect” was actually two separate transects: 1. 50 m x 5 m for big fish 2. 25 x 2 m transect for small fish • Half the sites were culled • Half were not culled • The authors report some stats about the magnitude of culling effort. Let’s not get into that - just treat ‘culling’ as a yes/no
  29. Sketching the study: Site 1 2 3 4 5 6

    Treatment Cull No Cull Scuba transects 50x5 25x2 No Cull Cull 50x5 25x2 50x5 25x2 Fish data Visually Estimated Derived # Prey fish Size per fish Richness, Density, Diversity # Native predators Size per fish Biomass # Lionfish Size per fish Biomass http://coraltriangle.blogs.pan da.org/seareefs/wwf- university-of-south-pacific-at- work-in-the-great-sea-reef Prey fish defined as any fish 15 cm or smaller
  30. Fish data Visually Estimated Derived # Prey fish Size per

    fish Richness, Density, Diversity # Native predators Size per fish Biomass # Lionfish Size per fish Biomass Richness: Number of species Density: Number of individuals per unit area - In this study, small fish were measured on a 0.005 hectare transect - Big fish were measured on a 0.025 hectare transect Better divide by the correct number! Diversity: Shannon-Wiener index - A metric of biodiversity that incorporates both richness and abundance Recall: Assume fixed X Is X fixed for: - # Prey fish? (think: Detection Probability) - Equally likely to see all species? Could some be more cryptic? - Size per fish? - Species richness per transect? - Density of fish within a transect? - Shannon-Wiener index?
  31. Fish data Visually Estimated Derived # Prey fish Size per

    fish Richness, Density, Diversity # Native predators Size per fish Biomass # Lionfish Size per fish Biomass Biomass: Recall: Assume fixed X Is X fixed for: - # Native Predators? - Size per fish, for bigger fish? - Biomass? - Uncertainty around length-weight relationship? Is it consistent across body sizes? - Is uncertainty equivalent across species? Not given in the raw data, nor is it explained how they derived it for each fish Normally, you would use a published length-weight relationship: log(W) = log(a) + B*log(L) + error W is weight, L is length, and A and B are species-specific parameters Must disclose a and B! And beware; At extremes (especially low end) uncertainty increases with this translation
  32. Core questions: • Does culling lionfish increase: • Diversity of

    prey fish? • As defined by: • Species richness (# of species) • Shannon-Weiner index (calculated from # and abundance of species) • Abundance of prey fish? • Does culling lionfish decrease: • Density of lionfish? • But there are a lot of other variables • Site • Native predator density • Coral cover • Substratum complexity Q: What is truly driving the system?
  33. How they did it: Perform a bunch of bivariate linear

    regressions: Y= βo + β1 *X + error CoralCover = βo + Culling*X + error Substratum = βo + Culling*X + error NativePreds = βo + Culling*X + error LionfishBiomass = βo + Culling*X + error Result No effect No effect No effect Effect Conclude: From model: Culling effected both lionfish biomass and fish density. From raw data: Culled reefs had “30% higher density of prey-size fish” PreyFishDensity = βo + Culling*X + error PreyRichness = βo + Culling*X + error PreyDiversity = βo + Culling*X + error Effect No effect No effect
  34. The model was used to answer: Did culling have statistically

    significant effect? Yes/no. P-value The raw data were used to answer: How much higher was reef density on these culled reefs? Danger: Multiple comparisons. Uncertainty on X due to derived values Missed opportunity: Use the MODEL to infer impact of culling on reef fish density – including 95% C.I. to tell us the magnitude of effect that was explainable by culling. If the model fits the data, this is powerful information!
  35. Let’s look at the data Pre-R processing - Excel spreadsheet

    - Four tabs - First step – save as four CSV files:
  36. Pre-R processing One row = 1 species and size class

    Row may represent more than one observation There were 18 unique transects, 3 per site (labelled 1, 2, 3). We can tell which unique ‘transect’ this was from by looking at site AND transect together. Other sheets have Site x Transect too. Need to connect sheets
  37. Data preparation - Our first goal must be to produce

    one large spreadsheet containing all information - One row = one fish - Unique TransectID for each transect - (e.g. Site 1, transect 1 = TransectID 1. Site 2, transect 1 = TransectID 4). - This will require some fancy programming - Open R, begin