EE 509: Applied Environmental Statistics

Course syllabus

Instructor:

Michael C. Dietze

dietze@bu.edu

STO 457A

Office hours by appointment

Goals:

The primary focus of this course is on probability-based statistical methods employed in the environmental, earth, and ecological sciences. Students in this class will explore a variety of statistical modeling topics from both a likelihood and Bayesian perspective, building progressively from simple models to sophisticated analyses. Students will be exposed to the concepts behind these approaches, the computational techniques to implement them, and their application to common problems in environmental science.Throughout the focus will be on how to construct statistical models that allow us to confront theory with data. The first third of the course will cover foundational concepts. The middle third will work from simple linear regression up to general linear mixed models and hierarchical models with particular emphasis on the complexities common to environmental data: heteroskedasticity, missing data, latent variables, errors in variables, and multiple sources of variability at different spatial,temporal, and taxonomic scales. The last third will cover time-series and spatial data, both of which are ubiquitous in the environmental and earth sciences. Attention throughout the course will be given to environmental applications, and in particular data and models unique (e.g. mark-recapture, matrix population models) or particularly important (e.g. kriging, CAR) to earth and environmental science.

Contact hours/week: Three 50-min lectures and one 2-hr computer lab

Prerequisites:

Introductory statistics (CAS MA115/116 or MA213/124 or equivalent) and

Calculus I (CAS MA121 or CAS MA123 or equivalent) and

Probability(CAS MA581) or consent of the instructor

BU HUB:

This course meets the following HUB learning outcomes

Philosophical Inquiry and Life’s Meanings

1. Students will demonstrate knowledge of notable works in philosophical thought, make meaningful connections among them, and be able to relate those works to their own lives and those of others.

This course makes important connections between the philosophy of science and the practical applications of statistical methods. We explicitly discuss multiple alternative schools of thought in the philosophy of science (Popper, Kuhn, Polanyi, Lakotos) and how they relate to hypothesis testing and model selection. A core, recurring component of the course involves an ongoing discussion about Bayesian vs. frequentist philosophies: What do probability and uncertainty mean (e.g. is probability subjective or objective)? How does the fact that we can never observe the world perfectly affect our ability to make inferences? Is there chance/stochasticity in the world around us, or is the universe fundamentally deterministic, and how does that belief affect our ability to make inferences about the world around us? How do these philosophies impact the types of questions we can ask about the natural world? Students will demonstrate this knowledge through a combination of exam questions and lab reports.

2. Students will demonstrate the reasoning skills and possess the vocabulary to reflect upon significant philosophical questions and topics such as what constitutes a good life, right action, meaningful activity, knowledge, truth, or a just society.

As noted in Outcome 1, this course will ask students to reflect on significant questions about knowledge and truth. Demonstration of vocabulary, reasoning skills, and notable works will occur through exam questions and lab report questions.

Writing Intensive

1. Students will be able to craft responsible, considered, and well-structured written arguments, using media and modes of expression appropriate to the situation.

The semester project is a core component of this course, which culminates in a 5000 word paper (15-20 pages double spaced, plus abstract, figures, tables, and citations) written according to the guidelines and style of a scientific journal. The development of the paper is scaffolded through a number of project milestones where students get feedback from the instructor and have the opportunity to revise the different sections of the paper (project prospectus = Introduction, model description = Methods, preliminary results = Results). There is also one lab (#12) specifically set aside for paired reviews, where students receive both oral and written feedback from a peer. In addition, students will also submit thirteen other lab reports, with a typical length of 10-25 pages each (including text, code, figures, and tables). Aimed at graduate student and upper-level undergraduates in our major, learning well-structured scientific writing is the key mode of expression appropriate for this discipline.

2. Students will be able to read with understanding, engagement, appreciation, and critical judgment.

In addition to technical writing, this course will help students to better develop skills at technical reading. Indeed, a core aim of this course is to enable students to be able to read and critically evaluate the modern quantitative methods used in the primary scientific literature. Specifically, through the use of case-study based labs students will learn how to evaluate the hypotheses laid out in each problem, the statistical models used to test these hypotheses, and the results and discussion of such models.

3. Students will be able to write clearly and coherently in a range of genres and styles, integrating graphic and multimedia elements as appropriate.

Students will integrate graphic elements (figures and graphs) throughout their lab reports and final project.

Course Materials:

Required Text: Models for Ecological Data: An Introduction. 2007. James S. Clark ISBN: 9780691121789
Book is available at the university bookstore or can be purchased online

The primary text will be supplemented with PDFs of select readings from additional textbooks and the primary literature. Literature readings focus on examples of the application of statistical models in the environmental literature rather than methods papers. These “case studies” will also serve as the focus for the analysis problems in the lab component.

Students will also make extensive use of the following statistical software (which is freely available on the internet) in order to complete assignments:

If you want to avoid running computationally-intensive analyses on your personal computer / laptop, you may want to try running RStudio Server through the SCC OnDemand web interface, which will allow you to run jobs on BU SCC cluster (BU only)

Grading:

Grading will be based on lab reports/problem sets, a semester-long project, and four exams.

Lab reports/problem sets (10 points each)= 150
Semester project= 95
      project proposal (10)
      model description (15)
      preliminary analysis (20)
      final report (50)
Exams (30, 25, 30, 30 points ) = 115
Total= 360

Lectures/Labs

Please refer to the course website for the schedule of lecture/lab topics and the assigned readings that go with these. Students are expected to complete readings before class.

Lab attendance is mandatory. Lab reports will not be accepted for labs missed due to unexcused absences. Lab reports are due by the start of lab the following week and will be penalized 10%/day if turned in late. Lab materials will be made available in the GitHub repository. Details on what needs to be turned in will be provided with each lab.

You may discuss lab assignments with other students, but you each must turn in your own written report and code.

Semester Project

A core component of this course is a semester-long independent analysis and write-up. There are a number of benchmarks over the course of the semester to ensure adequate progress is being made and to provide you with feedback. A more detailed description will be provided before each task is due.

Project Proposal: 1-2 pages double-spaced. Students are expected to describe the data set they intend to analyze and present the scientific question that motivates their analysis. Students are encouraged to make use of their own data sets for the semester project.

Model description: 1-2 pages double-spaced. A brief description of how the data will be analyzed. Should include a mathematical specification of the process model(s), the data model, and the parameter model and a figure of how these relate to one another.

Preliminary Analysis: 1-3 pages double space text plus R/BUGS code plus a minimum of 5 results figures with legends. At this point analysis should be mostly complete. Text should briefly describe the computational methods of the analysis and any modifications of the model description (i.e. what did you actually end up doing).

Final Report: The final report should be written in the style and tone of a scholarly publication, though with greater emphasis on the results and statistical methods employed and less on introduction and discussion. Specifically, we will be using the Ecology Letters format: no more than 5000 words in length and no more than 6 figures or tables. For more detailed guidelines see http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291461-0248/homepage/ForAuthors.html

Project Due Dates:

Project Proposal: 2/14

Model Description: 3/6

Preliminary Analysis: 4/10

Final Report: Before Exam 4 (Final)

Exams

Exams will be a combination of short answer and multiple choice. The final exam will be non-cumulative.

Midterm I: 2/10

Midterm II: 2/26

Midterm III: 3/30

Final: 5/5

Lecture Schedule

Date

Topics

Reading

Project

1/22

Introduction to model-based inference
Case: synthesis of field data, remote sensing, and mechanistic models

Clark: Chapter 1
Optional: Otto and Day: Math Review
Slides

1/24

Probability theory: joint, conditional, and marginal distributions
Case: Island Biogeography

Hilborn and Mangel Ch 3 p39-62
Optional: Clark Appendix D
Slides

1/27

Probability theory: discrete and continuous distributions
Case: Zero-inflated census data

Hilborn and Mangle Ch 3 p62-93
Optional: Clark Appendix F
Slides

1/29

Maximum Likelihood
Case: Censured mortality data

Chapter 3.1-3.2
Optional: Chapter 2
Slides

1/31

Point estimation by MLE
Case: Survival analysis, population growth rate

Chapter 3.3-3.5
Slides

2/3

Analytically tractable MLEs
Case: Bestiary of response functions

Chapter 3.6-3.9
Optional: Bolker Ch 3
Slides

2/5

Intractable MLEs and basic numerical optimization
Case: Michaelis-Menton kinetics

Chapter 3.10-3.13
Slides

2/7

Exam Review

2/10

EXAM 1: Probability Theory, Maximum Likelihood

2/12

Bayes Theorem
Case: Detecting climate warming

Chapter 4.1
Ellison 2004
Slides

2/14

Point estimation using Bayes
Case: Normal mean and variance

Chapter 4.2
Slides

Project Proposals

2/17

Analytically-tractable Bayes: conjugacy and priors
Case: Plant trait databases

Chapter 4.3, Appendix G
Slides

2/19

Numerical methods for Bayes: MCMC

Chapter 7.1-7.2, 7.3 intro
NY Times article
Slides

2/21

MCMC: Metropolis-Hastings

7.3.1, 7.3.2, 7.5
Slides

2/24

MCMC: Gibbs sample

Chapter 7.3.3, 7.3.4
Slides

2/26

EXAM 2: Bayes, MCMC

2/28

Interval Estimation: Bayesian credible intervals

Chapter 5
Slides

3/2

Frequentist confidence intervals I: Likelihood profile, Fisher information

Chapter 5
Slides

3/4

Frequentist confidence intervals II: Bootstrapping

Chapter 5

3/6

Model Selection: Likelihood ratio test, AIC
Case: Southern Brown Frog

Hilborn and Mangel Chapter 2
Slides

Model Description

SPRING BREAK

3/16

Model Selection: DIC, predictive loss, model averaging
Case: Multi-model weather forecasting

Chapter 6
Slides
Video

3/18

Errors in variables, heteroskedasticity
Case: Time-domain reflectometry

Chapter 5.4 & 7.4
Slides
Video 1 - Heteroskedasticity
Video 2 - Errors in Variables

3/20

Latent variables, Missing data models
Case: Carbon flux towers

Chapter 7.6, 7.7, 8.1
Slides
Video 1 - Missing Data Models
Video 2 - Generalized Linear Models (GLM)

3/23

Logistic regression
Case: Pollution and mortality risk

Chapter 8.2-8.2.3
Slides
Video 1 - Poisson Regression
Video 2 - Logistic, Logit-Exponential, & Logit-Normal Models
Video 3 - Multinomial Regression

3/25

GLMs
Case: Plot count data (Poisson regression)
Case: Canopy position data (Multinomial)

Chapter 8.2-8.2.3

3/27

Hierarchial Bayes

Slides
Video 1 - Hierarchical Bayes: Concepts
Video 2 - Hierarchical Bayes: Random Effects
Video 3 - Hierarchical Bayes: JAGS Examples
Video 4 - Hierarchical Bayes: Mixed Models

3/30

EXAM 3 GLMM, HB

4/1

Hierarchical Bayes 2
Case: Canopy and biomass allometries

Chapter 8.2.5 - 8.3
Slides
Video - Case study: Hierarchical allometries

4/3

Nonlinear models
Case: Coho salmon
Case: Photosynthetic responses to light, CO2

Chapter 8.4
Slides
Video 1 - Nonlinear Models
Video 2 - Nonlinear Hierarchical Models
Video 3 - Nonlinear Hierarchical Models with covariates
Video 4 - Hierarchical Data Fusion

4/6

Applications of random effects models
Case: Remote sensing

Chapter 8.5-8.7

4/8

Time series: Basics and State-Space
Case: Moose population fluctuations

Chapters 9.1, 9.2, 9.6
Slides
Video 1 - Time Series Intro
Video 2 - State Space Intro
Video 3 - State Space: Exponential Growth

4/10

TBD

Preliminary Analysis

4/13

Time series: Mark-Recapture
Case: Black Noddy

Chapter 9.7, 9.8, 9.16
Slides
Video 1 - Unequal intervals and nonlinear dynamic models
Video 2 - Mark Recapture

4/15

Time series: ARMA
Case: Fire in the Everglades

Chapter 9.3, 9.5
Slides
Video 1 - Concepts & Definitions
Video 2 - Smoothing
Video 3 - Detrending
Video 4 - Autocorrelation
Video 5 - Autoregressive Models
Video 6 - ARIMA

4/17

Time Series: Repeated Measures
Case: Soil Respiration

Chapter 9.10, 9.14, 9.15
Slides
Video 1 - Concepts
Video 2 - Implementation
Video 3 - Generalization
Video 4 - Interventions & Change Points

4/22

Spatial: point-referenced (geostatistical) data & Kriging
Case: Mapping soil moisture

Chapter 10.7
Slides
Video 1 - Spatial Point Pattern
Video 2 - Spatial Point Referenced Data
Video 3 - Spatial Smoothing
Video 4 - Spatial covariance & Variograms
Video 5 - Spatial Interpolation
Video 6 - Kriging

4/24

Spatial: Markov Random Field
Case: Superfund monitoring

Chapter 10.8
Slides
Video 1 - Spatial Modeling Concepts
Video 2 - Spatial Model Coding
Video 3 - Spatial Models: Prediction
Video 4 - Markov Random Fields

4/27

Spatial: block-referenced data and misalignment
Case: relating ozone & census data

Chapter 10.9
Slides
Video 1 - Spatial Block Data: Concepts
Video 2 - Spatial Block Exploratory Stats
Video 3 - Conditional Autoregressive Models
Video 4 - Spatial Misalignment

4/29

Spatial: conditional autoregressive models (CAR)
Case: South African biodiversity

Chapter 10.10

5/5

EXAM 4, 9-11AM

FINAL PROJECT

 

Lab Syllabus

Lab

Week

Topics

Software

1

1/22

Introduction to R

R

2

1/29

Probability distributions and sampling

R

3

2/5

Fire return intervals: Maximum likelihood basics

R

4

2/12

Ecosystem responses to CO2: ML numerical optimization

R

5

2/19

Forest stand characteristics: Intro to BUGS

JAGS

6

2/26

Regression: Gibbs sampler

R

7

3/4

Nonlinear plant growth: Metropolis Algorithm

R

8

3/18

CO2 revisited: Interval estimation and model selection

R

9

3/25

Understory Regeneration: Random effects

Both

10

4/1

Mosquito abundance: Hierarchical modeling

JAGS

11

4/8

Moose population fluctuations: State-space time series

JAGS

12

4/15

Peer Assessment of projects

 

4/22

Wed = Mon; no lab

13

4/29

Ozone: Space/time exploratory data analysis

R

 

Academic Code

It is your responsibility to know and understand the provisions of the CAS Academic Conduct Code. Copies are available in CAS 105. Suspected cases of academic misconduct will be referred to the Dean’s Office. See http://www.bu.edu/academics/resources/academic-conduct-code for conduct information for undergraduates and http://www.bu.edu/cas/students/graduate/forms-policies-procedures/academic-discipline-procedures/ for graduate student conduct requirements.