Instructor:
Michael C. Dietze
STO 457A
Office hours by appointment
Goals:
The primary focus of this course is on probability-based statistical methods employed in the environmental, earth, and ecological sciences. Students in this class will explore a variety of statistical modeling topics from both a likelihood and Bayesian perspective, building progressively from simple models to sophisticated analyses. Students will be exposed to the concepts behind these approaches, the computational techniques to implement them, and their application to common problems in environmental science.Throughout the focus will be on how to construct statistical models that allow us to confront theory with data. The first third of the course will cover foundational concepts. The middle third will work from simple linear regression up to general linear mixed models and hierarchical models with particular emphasis on the complexities common to environmental data: heteroskedasticity, missing data, latent variables, errors in variables, and multiple sources of variability at different spatial,temporal, and taxonomic scales. The last third will cover time-series and spatial data, both of which are ubiquitous in the environmental and earth sciences. Attention throughout the course will be given to environmental applications, and in particular data and models unique (e.g. mark-recapture, matrix population models) or particularly important (e.g. kriging, CAR) to earth and environmental science.
Contact hours/week: Three 50-min lectures and one 2-hr computer lab
Prerequisites:
Introductory statistics (CAS MA115/116 or MA213/124 or equivalent) and
Calculus I (CAS MA121 or CAS MA123 or equivalent) and
Probability(CAS MA581) or consent of the instructor
Course Materials:
Required Text: Models for Ecological Data: An Introduction. 2007. James S. Clark ISBN: 9780691121789
Book is available at the university bookstore or can be purchased online
The primary text will be supplemented with PDFs of select readings from additional textbooks and the primary literature. Literature readings focus on examples of the application of statistical models in the environmental literature rather than methods papers. These “case studies” will also serve as the focus for the analysis problems in the lab component.
Students will also make extensive use of the following statistical software (which is freely available on the internet) in order to complete assignments:
Grading:
Grading will be based on lab reports/problem sets, a semester-long project, and four exams.
Lab reports/problem sets (10 points each) | = 150 |
Semester project (Grad only) | = 95 |
project proposal (10) | |
model description (15) | |
preliminary analysis (20) | |
final report (50) | |
Exams (30, 25, 30, 30 points ) | = 115 |
Total, Grad | = 360 |
Total, Undergrad | = 265 |
Lectures/Labs
Please refer to the course website for the schedule of lecture/lab topics and the assigned readings that go with these. Students are expected to complete readings before class.
Lab attendance is mandatory. Lab reports will not be accepted for labs missed due to unexcused absences. Lab reports are due by the start of lab the following week and will be penalized 10%/day if turned in late. Lab materials will be made available in the GitHub repository. Details on what needs to be turned in will be provided with each lab.
You may discuss lab assignments with other students, but you each must turn in your own written report and code.
Semester Project (Graduate Students only)
The core component that separates the undergraduate from the graduate version of this course is a semester-long independent analysis and write-up. There are a number of benchmarks over the course of the semester to ensure adequate progress is being made and to provide you with feedback. A more detailed description will be provided before each task is due.
Project Proposal: 1-2 pages double-spaced. Students are expected to describe the data set they intend to analyze and present the scientific question that motivates their analysis. Students are encouraged to make use of their own data sets for the semester project.
Model description: 1-2 pages double-spaced. A brief description of how the data will be analyzed. Should include a mathematical specification of the process model(s), the data model, and the parameter model and a figure of how these relate to one another.
Preliminary Analysis: 1-3 pages double space text plus R/BUGS code plus a minimum of 5 results figures with legends. At this point analysis should be mostly complete. Text should briefly describe the computational methods of the analysis and any modifications of the model description (i.e. what did you actually end up doing).
Final Report: The final report should be written in the style and tone of a scholarly publication, though with greater emphasis on the results and statistical methods employed and less on introduction and discussion. Specifically, we will be using the Ecology Letters format: no more than 5000 words in length and no more than 6 figures or tables. For more detailed guidelines see http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291461-0248/homepage/ForAuthors.html
Project Due Dates:
Project Proposal: 2/9
Model Description: 3/4
Preliminary Analysis: 4/9
Final Report: Before Exam 4 (Final)
Exams
Exams will be a combination of short answer and multiple choice. The final exam will be non-cumulative.
Midterm I: 2/14
Midterm II: 3/16
Midterm III: 4/11
Final: 5/7
Lecture Schedule
Date |
Topics |
Reading |
Project |
1/19 |
Introduction to model-based inference |
Clark: Chapter 1 |
|
1/22 |
Probability theory: joint, conditional, and marginal distributions |
Hilborn and Mangel Ch 3 p39-62 |
|
1/24 |
Probability theory: discrete and continuous distributions |
Hilborn and Mangle Ch 3 p62-93 |
|
1/26 |
Maximum Likelihood |
Chapter 3.1-3.2 |
|
1/29 |
Point estimation by MLE |
Chapter 3.3-3.5 |
|
1/31 |
Analytically tractable MLEs |
||
2/2 |
Intractable MLEs and basic numerical optimization |
Chapter 3.10-3.13 |
|
2/5 |
Bayes Theorem |
Chapter 4.1 |
|
2/7 |
Point estimation using Bayes |
Chapter 4.2 |
|
2/9 |
Analytically-tractable Bayes: conjugacy and priors |
Chapter 4.3, Appendix G |
|
2/12 |
Review |
||
2/14 |
EXAM 1: Probability Theory, Maximum Likelihood, Bayes Theorem |
||
2/16 |
Numerical methods for Bayes: MCMC |
Chapter 7.1-7.2, 7.3 intro |
|
2/19 |
MCMC: Metropolis-Hastings |
7.3.1, 7.3.2, 7.5 |
|
2/21 |
MCMC: Gibbs sample |
Chapter 7.3.3, 7.3.4 |
|
2/23 |
Interval Estimation: Bayesian credible intervals |
Chapter 5 |
|
2/26 |
Frequentist confidence intervals I: Likelihood profile, Fisher information |
Chapter 5 |
|
2/28 |
Frequentist confidence intervals II: Bootstrapping |
Chapter 5 |
|
3/2 |
Model Selection: Likelihood ratio test, AIC |
||
3/12 |
Model Selection: DIC, predictive loss, model averaging |
Chapter 6 |
|
3/14 |
TBD |
||
3/16 |
EXAM 2: MCMC, Intervals, Selection |
||
3/19 |
Errors in variables, heteroskedasticity |
Chapter 5.4 & 7.4 |
|
3/21 |
Latent variables, Missing data models |
Chapter 7.6, 7.7, 8.1 |
|
3/23 |
Logistic regression |
Chapter 8.2-8.2.3 |
|
3/26 |
GLMs |
Chapter 8.2-8.2.3 Slides |
|
3/28 |
Mixed Models |
Chapter 8.2.4 |
|
3/30 |
Multivariate Regression |
||
4/2 |
Hierarchical Bayes |
Chapter 8.2.5 - 8.3 |
|
4/4 |
Nonlinear models |
Chapter 8.4 |
|
4/6 |
Applications of random effects models |
Chapter 8.5-8.7 |
|
4/9 |
TBD |
||
4/11 |
EXAM 3 GLMM, HB |
||
4/13 |
Time series: Basics and State-Space |
Chapters 9.1, 9.2, 9.6 |
|
4/18 |
Time series: Mark-Recapture |
Chapter 9.7, 9.8, 9.16 |
|
4/20 |
Time series: ARMA |
Chapter 9.3, 9.5 |
|
4/23 |
Time Series: Repeated Measures |
Chapter 9.10, 9.14, 9.15 |
|
4/25 |
Spatial: point-referenced (geostatistical) data & Kriging |
Chapter 10.7 |
|
4/27 |
Spatial: Markov Random Field |
Chapter 10.8 |
|
4/30 |
Spatial: block-referenced data and misalignment |
Chapter 10.9 |
|
5/2 |
Spatial: conditional autoregressive models (CAR) |
Chapter 10.10 |
|
5/7 |
EXAM 4, 9-11AM |
Lab Syllabus
Lab |
Week |
Topics |
Software |
1 |
1/22 |
Introduction to R |
R |
2 |
1/29 | Probability distributions and sampling |
R |
3 |
2/5 |
Fire return intervals: Maximum likelihood basics |
R |
4 |
2/12 |
Ecosystem responses to CO2: ML numerical optimization |
R |
5 |
2/19 |
Forest stand characteristics: Intro to BUGS |
JAGS |
6 |
2/26 |
Regression: Gibbs sampler |
R |
7 |
3/12 |
Nonlinear plant growth: Metropolis Algorithm |
R |
8 |
3/19 |
CO2 revisited: Interval estimation and model selection |
R |
9 |
3/26 |
Understory Regeneration: Random effects |
Both |
10 |
4/2 |
Mosquito abundance: Hierarchical modeling |
JAGS |
11 |
4/9 |
Moose population fluctuations: State-space time series |
JAGS |
12 |
4/16 |
Peer Assessment of projects |
|
13 | 4/23 |
Ozone: Space/time exploratory data analysis |
R |
14 |
4/30 |
South African biodiversity: Spatial CAR and Kriging |
WinBUGS |
Academic Code
It is your responsibility to know and understand the provisions of the CAS Academic Conduct Code. Copies are available in CAS 105. Suspected cases of academic misconduct will be referred to the Dean’s Office. See http://www.bu.edu/academics/resources/academic-conduct-code for conduct information for undergraduates and http://www.bu.edu/cas/students/graduate/forms-policies-procedures/academic-discipline-procedures/ for graduate student conduct requirements.