E(CatchRate) = μ ij Log(μ ij ) = GearType ij + Temperature ij + FleetDeployment i FleetDeployment i ~ N(0, σ2) Using lme4: m <- glmer(CatchRate ~ GearType + Temperature + (1 | FleetDeployment), family = poisson) FISH 6003 FISH 6003: Statistics and Study Design for Fisheries Brett Favaro 2017 This work is licensed under a Creative Commons Attribution 4.0 International License
in which we gather as the ancestral homelands of the Beothuk, and the island of Newfoundland as the ancestral homelands of the Mi’kmaq and Beothuk. We would also like to recognize the Inuit of Nunatsiavut and NunatuKavut and the Innu of Nitassinan, and their ancestors, as the original people of Labrador. We strive for respectful partnerships with all the peoples of this province as we search for collective healing and true reconciliation and honour this beautiful land together. http://www.mun.ca/aboriginal_affairs/
but we aren’t sure if we should believe it’s real, because our data are noisy. The two steps: •Step 1. Measure the strength of pattern in our data. •Step 2. Ask ourselves, is this pattern strong enough to be believed? But before we can do this, we must understand our data
we are reading e.g. FleetID PotID Area PotType Catch_kg 1 1 North A 25 1 1 North A 24 1 2 North B 10 2 1 South A 5 2 2 South B 5 2 2 South B 6 Before we analyze, we must ask: How was this study designed? Want to know: How does pot type affect catch rate?
experimental design: • For an experiment to work, you must hold as many variables as possible constant, and change only variables of interest • e.g. don’t change pot type, location, bait, haphazardly because you can’t tell WHICH of those caused change in catch rate • If design was bad enough, you may stop right here • Could be no inference is possible • Statistically: • Because all statistical models operate on ASSUMPTIONS • Violate the assumption → Model gives you the wrong answer • False positive or false negative possible
treatment of data 2. Do the pilot study • Analyze pilot data. Revise plans 3. Conduct power analysis 4. Do the full study 5. Explore data 6. Analyze data, report results Today
collected the data? • How was it grouped? • How many samples were taken/will be taken per group? • Any spatial or temporal issues? • i.e. one study, or several? One site, or several?
BRDs (i.e. two entrance-ring and three bent-tunnel variants) as well as unmodified traps (control) to identify the BRD design that offers the best trade-off between minimizing bycatch while maintaining prawn catch. From a 9.8 m-long research vessel, we deployed gears in “strings” which contained 10 traps connected to a single line weighted with one cinder block at each end. We deployed a total of 154 strings (i.e. 1540 traps). The most common configuration of traps in each string was: two control traps (7.6 cm entrances), one trap with 7.0 cm entrances, one trap with 6.4 cm entrances, and two of each BRD variant (4-ring, 5-ring, and 7-ring), with the order of traps being randomized within each string.
BRDs (i.e. two entrance-ring and three bent-tunnel variants) as well as unmodified traps (control) to identify the BRD design that offers the best trade-off between minimizing bycatch while maintaining prawn catch. From a 9.8 m-long research vessel, we deployed gears in “strings” which contained 10 traps connected to a single line weighted with one cinder block at each end. We deployed a total of 154 strings (i.e. 1540 traps). The most common configuration of traps in each string was: two control traps (7.6 cm entrances), one trap with 7.0 cm entrances, one trap with 6.4 cm entrances, and two of each BRD variant (4-ring, 5-ring, and 7-ring), with the order of traps being randomized within each string. Tunnel Size ctrl One boat X10 / string Not all were identical? Ctrl Ctrl 7cm 6.4cm 4R 4R 5R 5R 7R 7R (order randomized)
BRDs (so that each string had one steel and one PVC variant of each BRD type) but all PVC variants were eventually discarded because they were not durable (total of 155 PVC-BRD traps excluded). In addition, we included the 6.4 cm variant one week into the study, when we became curious about a more extreme reduction in trap opening size. One string of gear was lost during the study, while another was carried several kilometres from its original deployment site, and so its data were discarded. Three traps also became detached from one string line and were lost. Data from 1362 traps were therefore included in the present analysis (322 control traps (i.e. 7.6 cm entrances), 256 traps with 7.0 cm entrances, 145 traps with 6.4 cm entrances, 214 traps with 4-ring tunnels, 214 traps with 5- ring tunnels, and 211 traps with 7-ring tunnels). We deployed gear in two regions of southern British Columbia (Figure S1): Howe Sound, near Vancouver (49 25′ 30′′N 123820′ 00′′W), and the southern Gulf Islands, near Sidney (48 39′ 00′′N 123823′ 00′′W).
BRDs (so that each string had one steel and one PVC variant of each BRD type) but all PVC variants were eventually discarded because they were not durable (total of 155 PVC-BRD traps excluded). In addition, we included the 6.4 cm variant one week into the study, when we became curious about a more extreme reduction in trap opening size. One string of gear was lost during the study, while another was carried several kilometres from its original deployment site, and so its data were discarded. Three traps also became detached from one string line and were lost. Data from 1362 traps were therefore included in the present analysis (322 control traps (i.e. 7.6 cm entrances), 256 traps with 7.0 cm entrances, 145 traps with 6.4 cm entrances, 214 traps with 4-ring tunnels, 214 traps with 5- ring tunnels, and 211 traps with 7-ring tunnels). Design was unbalanced Replicates We deployed gear in two regions of southern British Columbia (Figure S1): Howe Sound, near Vancouver (49 25′ 30′′N 123820′ 00′′W), and the southern Gulf Islands, near Sidney (48 39′ 00′′N 123823′ 00′′W). Two sites, far apart
5R 7R 7R (order randomized) “Family” of modifications (2) Specific modifications (6) Traps are nested within strings Individual animals (if relevant) are nested within traps The design is not perfectly balanced Sites (2)
Watch for: • Is there uncertainty around X values? If yes… may be trouble! • Did you measure across the range of X values, or are there large gaps? If yes… may be trouble • In the above study: • Trap type is fixed. We know exactly what type of trap we’re fishing. Zero uncertainty. • What about environmental variables, e.g. depth? • Is there uncertainty in X? • Common problems in fisheries: • If age is on X: have you measured age correctly? How certain are you? • Putting too many decimals on X. Can you really measure to 0.01 cm? Or should you bin to nearest integer?
observations independent? “Pseudoreplication is defined as the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent.” Independence = Y value at Xi is not influenced by other Xi values
within strings. They are likely not independent: I.e. traps within a string are likely to be more similar to each other than they are across strings String 1 is in a good site. It also has Trap A. Are better trap rates because of better trap, or better site? Y: Catch per trap Traps are independent. (Right?) All trap types are in both good and bad sites Bad site
catch within trap within string) • Geographical distribution not considered • Some other grouping factor not considered (phylogeny? Genetics (e.g. fish reared from specific hatchery populations?) Does it matter? - Depends! What do I do? - Ignore it (risky!) - Remove data () - Take an average (sort of like removing data) - Account for it statistically, if possible (in a few weeks)
of samples. Allows you to identify risk areas, especially non-independence Exercise (if we have time): Sketch the study design of: https://peerj.com/articles/3818/ 10 minutes to read. Then we’ll do as a class
other values in a dataset (in both X and Y directions) • Can be quantified (e.g. via boxplots), or qualitatively identified • Outliers may be: • Due to entry error must remove or fix • Due to an actual biological process gotta decide what to do!
on your model? If yes, is that okay? Are they reflecting “genuine variation?” • Or do they represent a different biological process beyond the scope of your model? • Bigger problem with smaller datasets
between largest and smallest variance is 4 or more, you’re in trouble Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, 2nd edn. Sage Publications, CA. This becomes really important when assessing model fit. Stay tuned.
violation of normality • Some tests don’t require it at all • Some tests (e.g. t-test) require it within each group! • Many tests actually assume normality at each covariate level However… Zuur et al (2010) http://onlinelibrary.wiley.com/doi/10.1111/j.2041- 210X.2009.00001.x/full
have a lot of zeroes • E.g. bycatch rate of a rare species. Most traps do not catch it • E.g. behavioural data. Most time periods do not show the behavior • This can badly disrupt model fit. Must use special modelling techniques to account for this. Zuur et al (2010) http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2009.00001.x/full
vary together • Biological e.g.: Length and weight of fish (if both on X) • Experimental e.g.: Trap type and site Zuur et al (2009) http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2009.00001.x/full Good fishing site Bad site Here, site quality is collinear with trap type. They vary together. Another way to say this: As site changes, traps are also changing
can’t be sure X1 affected Y, because X1 was correlated with X2 . We don’t know which was affecting Y” Good fishing site Bad site Is the blue site better? Or does it APPEAR better because we have better traps there?
a disaster. Can only keep one in the model If X-Y relationship is very strong: - Correlation of 0.5-0.6 may be tolerable If X-Y relationship is weak: - Correlation even 0.3 to 0.4 can be a problem Careful! You can also have non-linear relationships between X values Check Variance Inflation Factor (VIF) score after modelling (later)
Zuur et al (2010) http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2009.00001.x/full • Before you even run a model… do these relationships make sense? • One final check for entry errors Number of banded sparrows
against X (one or more) • Y: Bird weight • X: Wing length, sex, month Zuur et al (2010) http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2009.00001.x/full • But is wing length correlated with bird weight the same across sex and month? • Use a “Coplot” to check for interactions: • Parallel(ish) lines probably mean no interaction • Different slopes → probably an interaction
Are values truly independent? • Spatially? • Temporally? • Nested experimental design? (Plants within garden plots? Pots within strings/fleets? Replications within Site?) • The answer is not always easy to determine. Think ecologically – how could my X values be influencing each other? Zuur et al (2009) http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2009.00001.x/full
in which we gather as the ancestral homelands of the Beothuk, and the island of Newfoundland as the ancestral homelands of the Mi’kmaq and Beothuk. We would also like to recognize the Inuit of Nunatsiavut and NunatuKavut and the Innu of Nitassinan, and their ancestors, as the original people of Labrador. We strive for respectful partnerships with all the peoples of this province as we search for collective healing and true reconciliation and honour this beautiful land together. http://www.mun.ca/aboriginal_affairs/
Sea, sustainable fisheries of the sea urchin Paracentrotus lividus have become a necessity. At harvesting sites, the systematic removal of large individuals (diameter >= 50 mm) seriously compromises the biological and ecological functions of sea urchin populations. Specifically, in this study, we compared the reproductive potential of the populations from Mediterranean coastal areas which have different levels of sea urchin fishing pressure. The areas were located at Su Pallosu Bay, where pressure is high and Tavolara-Punta Coda Cavallo, a marine protected area where sea urchin harvesting is low. Methods. Reproductive potential was estimated by calculating the gonadosomatic index (GSI) from June 2013 to May 2014 both for individuals of commercial size (diameter without spines, TD >= 50 mm) and the undersized ones with gonads (30 <= TD < 40 mm and 40 <= TD < 50 mm). Gamete output was calculated for the commercial-size class and the undersized individuals with fertile gonads (40 <= TD < 50 mm) in relation to their natural density (gamete output per m2). Y = GSI, a proxy for reproductive potential X = A whole bunch of things
need to understand the data you’re working with • Often it requires major data cleanup – even before you start a formal data exploration. Never skip this, even with your own data! • You will make many decisions during a statistical analysis. Know WHY you made them. Write them down. You can always change your mind later • PeerJ is a good resource for experimental data in fisheries
have created a Markdown template to get you started: Complete the template, render an HTML file, and submit Describe, in words, your findings from each section (brief)
exploration (8 steps total) /1 All relevant variables are assessed at that step /1 Appropriate plots are used /1 Description of findings is defensible Total: /24, scaled to 10% of course grade Submit your HTML file into your OneDrive by Jan 30