(μ ij ) E(CatchRate) = μ ij Log(μ ij ) = GearType ij + Temperature ij + FleetDeployment i FleetDeployment i ~ N(0, σ2) Using lme4: m <- glmer(CatchRate ~ GearType + Temperature + (1 | FleetDeployment), family = poisson) FISH 6003 FISH 6003: Statistics and Study Design for Fisheries Brett Favaro 2017 This work is licensed under a Creative Commons Attribution 4.0 International License
in which we gather as the ancestral homelands of the Beothuk, and the island of Newfoundland as the ancestral homelands of the Mi’kmaq and Beothuk. We would also like to recognize the Inuit of Nunatsiavut and NunatuKavut and the Innu of Nitassinan, and their ancestors, as the original people of Labrador. We strive for respectful partnerships with all the peoples of this province as we search for collective healing and true reconciliation and honour this beautiful land together. http://www.mun.ca/aboriginal_affairs/
expressed as • 0’s and 1’s • Proportion • Rate • Where Y is an unordered category • Where Y is an ordered category • Where Y is zero-inflated Insufficient time to cover these in detail
enhance fisheries catch via the “spillover” effect, where biomass is exported to adjacent exploited areas. Recent studies in spearfishing fisheries suggest that the spillover of gear-naïve individuals from protected to unprotected sites increases catch rates outside the boundaries of MPAs. Whether this is a widespread phenomenon that also holds for other gear types and species is unknown. In this study, we tested if the distance to a Mediterranean MPA predicted the degree of vulnerability to hook and line in four small-bodied coastal fish species. Overall, our results suggest that an MPA-induced naïveté effect may not be universal and may be confined to only intrinsically highly catchable fish species. This fact emphasizes the importance of considering the behavioural dimension when predicting the outcomes of MPAs, otherwise the effective contribution may be smaller than predicted for certain highly catchable species such as S. scriba.
hook? - How long did it take the fish to approach the hook? - What species was the fish? - How far was the fish from the MPA border (inside or outside?) - What day was it? (Day 1-4) Let’s use this as Y
) = πi Var(Yi ) = πi * (1 - πi ) Distribution of response Predictor function Link Logit → logarithm of the odds What is logit? ηi = β0 + β1 Lt_Si + β2 Distance_mi + Speciesi
happening (0 to 1) e.g. probability of 0.6 if you did something infinite times, it would happen 60% of the time 0.6 / (1 – 0.6) = 1.5 “Odds of this thing happening happening are 1.5 to 1” Logit → logarithm of the odds → Probability ------------------- 1 - Probability Log( )
) = πi Var(Yi ) = πi * (1 - πi ) Distribution of response Predictor function Link log ( π i / (1 – π i ) ) = ηi Odds are 1:1 Probability is 0.5 Odds ratio is 1 An event is as likely to occur as not if: ηi = β0 + β1 Lt_Si + β2 Distance_mi + Speciesi
has two outcomes: • Pass (1) • Fail (0) • The probability of success is the same for every trial, and each trial is independent (Zuur et al. 2009) Bernoulli: One observation constitutes a trial A special case of Binomial
) = πi Var(Yi ) = πi * (1 - πi ) Distribution of response Predictor function Link This is expressed in “log odds” ηi = β0 + β1 Lt_Si + β2 Distance_mi + Speciesi
event occurring by 8.365*10-5 So the odds are… e-8.365*10^-5 = 0.9999164 For every one unit increase in distance_m, the probability of the event occurring decreases by 0.00001. Odds are confusing. Recommend self-study if you use them. And the probability is… p / 1 – p = 0.9999164 p = 0.499997
reduction study… https://www.sciencedirect.com/science/article/pii/S0165783615300254 “Sharks are well known to be able to detect electric fields in the microvolt range and this sense has been proposed to provide a mechanism to detect the earth's magnetic field. As a result, the use of magnets has been proposed as a method to reduce shark interactions with fishing gear. We therefore tested two models of high field strength neodymium magnets to effect shark catch rates during commercial longline fishing operations.”
boundary One hook can only catch one fish They had 144 hooks with magnets, 144 without You can count the number of sharks caught… but there is an upper limit Shark pics: WWF Canada 1 0 0 1 1 CPUE = 3/5 1
upper limit to how many you could count If there IS an upper limit, you are dealing with proportional data ηi = β0 + HookType logit(πi ) = ηi Catchi ~ B(πi , ni ) Distribution of response Predictor function Link Here, ni is the number of hooks sampled. πi is the probability of catching a shark. Assume all hooks within a replicate have equal probability of catching a shark.
0 0 0 1 0 Model as a Bernoulli GLM. Each hook is a replicate. Violates assumption of independence Would need to add a grouping variable (stay tuned for mixed effects lecture) Alternate strategy
Imagine we were not told how many hooks were in the water. Imagine we were just given CPUE’s. Conduct a beta GLM https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf Here, output is similar to a binomial GLM http://rcompanion.org/handbook/J_02.html
a given probability of success. Each event is independent Binomial GLM: - # of successes from a given number of trials. Each set of trials is independent Beta GLM: - Rates, other values that involve both numerator and denominator where 0 and 1 are not possible values. Each proportion is independent. E.g. Catch Per Unit Effort
common in the literature. Sometimes, rethinking Y can make things easier e.g. Instead of “Catch per 100 hooks” (assume each hook is independent) Y: 0/1, each hook is a replicate. Use Bernoulli GLM. e.g. 2: Recall the lionfish paper. Fish count was expressed as a density. Could have used beta regression. Or… turn it into a count (Poisson GLM )and make area an offset.
logistic regression E.g. Y: prey size (expressed in a category: Small, medium, large) X: predator body size E.g. 2 Y: Likert scale (1-5, where 1 is strongly disagree) X: Household income Use clm() from the ordinal package https://cran.r-project.org/web/packages/ordinal/vignettes/clm_intro.pdf
length, number of fish in a school Not ALWAYS a problem… e.g. consider a Poisson GLM When mean is high enough the model predicts few zeroes anyway. No problem When mean is low, zero is predicted frequently. Problem. See Chapter 11 of Solution: Zero-truncated Poisson or NB GLM Use VGAM package, vglm() with family = posnegbinomial or pospoisson
distribution NOT just “lots of zeroes” Poisson Negative binomial In both cases, distributions with low means can have lots of zeroes without being zero-inflated
Zeroes come from: 1. Reality. There really were no whales in most places because habitat is not suitable in most places. 2. Design error. My study design is flawed. I went whale- watching at the wrong time of year. I looked for whales on land. 3. Observer error. Whales were there, I just didn’t see them. 4. Subject error. The whales THEMSELVES made a mistake – they COULD have used other habitat, but didn’t. (Messes up my model’s ability to predict whale habitat) Imagine we are trying to count whales at sea from a boat. Most of the time we do not see a whale. Sometimes we see several whales at once. True zero False zeroes
Two-Part models Conduct modelling in two steps: A) Turn all data into 0’s and 1’s. Run Bernoulli GLM to model the probability that something is observed. B) Look only at values when Y >= 1. Run Zero-truncated Poisson or Negative Binomial. Use this when you think two separate processes drive presence/absence, and then abundance. E.g. Process 1: What predicts whether whales appear in this habitat? Process 2: What is the relationship between those predictors and the size of the pod of whales?
Negative Binomial model (that is NOT truncated) B) Use a binomial GLM to model the probability of the zeroes in your model being false zeroes or true zeroes Use this when you think there is just one process of interest, but due to some structural issue with your study design, you have a bunch of false zeroes that are causing zero-inflation. E.g. What habitat variables predict the number of whales in a part of the ocean? …But I know I will miss a lot of them due to low detectability… So in addition to the above question, what is the probability that the zeroes I have modelled are due to observer error?
vs. hurricanes). We described these data with a negative binomial GLM. If we simulated 10,000 datasets based on this model, how many Y=0 observations would we observe?
vs. hurricanes). We described these data with a negative binomial GLM. If we simulated 10,000 datasets based on this model, how many Y=0 observations would we observe? And how many zeroes did we actually have? Conclude: Data are not zero-inflated
many Y=0 observations would we observe? Recall: Shark magnets. Binomial GLM. Y: Probability of catching a shark. And how many zeroes did we actually have? Conclude: Data are not zero- inflated