Statistics
560: Introduction to Mathematical Statistics
Fall 2013
Course Syllabus
I make these assumptions about my
students:
·
They have had at least two semesters
of calculus and appreciate math
·
They value the use of statistics to
solve problems
·
They have the intellectual curiosity
to read books that give
a higher level understanding of what is covered in the course
·
They value learning technology as a
way to improve their job
prospects. I will use a variety of technologies to help students in this goal.
·
Catalog Description: STAT 560 –
Introduction to Mathematical Statistics (3)
Probability, probability distributions, simulation of random variables,
sampling distributions, central limit theorem, testing of hypotheses, confidence
intervals, maximum likelihood methods, Bayesian methods. Credit Restriction:
Not for credit for MS with a major in statistics or management science.
Recommended Background: Mathematics 241. Comment(s): A course equivalent to
Mathematics 241 also is acceptable.
·
Instructor: Ramón V. León
o
Email: rleon@utk.edu
o
Cell phone: 865 773 2245.
You can text me at just about any
time. Normally, I will respond right away or as soon as I can. You can also
call me, but I prefer that you text me first to set up a convenient time for us
to talk or Skype. (See next.) If I don’t respond within a reasonable time
call me on the phone directly.
o
Skype:
ramonvleon.
Having
a Skype account is required for this course: Once you have it send me a request
to be my contact. Make sure to state in your request that you are my student.
Otherwise I will not accept your request as I am very popular with the women of
Ghana. Think of my Skype address as
my virtual office. With Skype I can share with you my computer screen to show
you how to work a problem or demo software and see each other to increase
rapport. Whenever my computer is on I am accessible in Skype. With Skype you
can also chat with me using its chatting capabilities.
o
Office Hours: 3:30—4:30 MTWR using Skype (ramonvleon).
I will frequently be in my office (SMC 249) at these times. Please text me to
check if I will be there on a given day or to request that I meet you in my
office on a given day. You can also meet me by appointment.
·
Teaching Assistant: Thomas
Tilson
o
Email: ttilson1@utk.edu
o
Skype:
tltilson
o
Office Hours: TR 5:00 – 6:00 via Skype (tltilson) or
in person in the common area in front my office, that is, in front of SMC 249
·
Class Meeting Time and Place: MW 5:05 – 6:20 p.m. via Blackboard
Collaborate. This program can be accessed from Blackboard by going to the
“Tools” tab and then selecting “Blackboard
Collaborate.”
·
OneNote Website: This course in
supported by a OneNote website that you can access from Blackboard. Just click
on the OneNote tab in the upper-left corner of Blackboard. All the class notes
and other course material are available there. If you are signed in to the
site, as explained below, you will be able to download files by right-clicking
on them. (This may not work when using Chrome and if so just use Firefox or
Internet Explorer.)
·
Windows Live Account: To be able to
download files by right clicking on them from the course’s OneNote
website you need to have a Windows Live account and be signed in. You
automatically have a Windows Live account if you have an email address with one
of these suffixes @hotmail.com, @outlook.com, or @live.com. If you sign in as
you would normally do to check your email you will be also signing into your
Windows Live account and thus be able to download files by right clicking on
them. (Again, this may not work when using Chrome and if so just use Firefox or
Internet Explorer.) If you do not have one of these email addresses (or what is
the same thing Window Live account) you can register for one of these here. Warning: When one is at
the OneNote website one can sign in using one’s UT ID and password, but
for some reason signing in that way will not give you the ability to download
files by right clicking one them. Bottom line: You need to open a Windows Live
account directly from Microsoft.
·
Textbook: None since I supply very complete notes of my lectures. I will
also refer you to Wikipedia and other web resources.
·
JMP
Pro: (version 10.0) statistical software will be used throughout the
course. JMP is very easy to learn and I will be demoing it through the course.
Both PC and MAC versions can be downloaded for free at the following web
address: https://web.dii.utk.edu/softwaredistribution/. After logging into this site, click on
“SAS”, then select JMP Pro 10.0 for Windows or JMP Pro 10.0 for
Mac. Scroll up or down to see the “Download selected item” button.
Detailed instructions for downloading and installing this software can be found
at http://web.utk.edu/~cwiek/JMPinstall/.
We strongly encourage you to obtain this software for your own computer. However, JMP software can also be accessed
at many of the computing labs on campus, and through the “APPS
Server” at http://apps.utk.edu/
·
Exams:
There will be a take-home midterm and a take-home final exam. (The course
schedule will be provided later.)
·
Optional Project: The project is optional and not really
necessary for this mathematical course. However, if you have some data
pertaining to your job that you would like my help in analyzing I am game. If
you decide to do a project you need to send me a written proposal as a MS Word
document describing the data set and the questions that you plan to answer with
it. You need to submit with your proposal a JMP file containing the data. For
each column of your JMP file you need to use the Notes feature of JMP found in
the Column Properties drop-down menu to fully explain what the variable in the
column is. It is not enough to simply provide the necessarily cryptic column
name. Please this form to write the proposal.
Upon my approval of this proposal you
are to analyze the data and write a report with your analysis and conclusions.
The report should be submitted as a MS Word file. The JMP file with the project
data should be attached. This file should have the columns notes required in
the proposal and have your most important JMP analyses saved as scripts to the
data table. (I will show you how to do this.) The report should be at most
eight pages long and should contain a summary of the steps you used in your
analysis. You should include only a few selected graphs and tables. The final
model that you used in the analysis should be clearly stated at the end of the
report with a summary of the reasons why you selected it. Use
this form to write the project report.
· Assignments: All assignments will be made available in the Submission tab of Blackboard where you will also submit them. Assignments should be submitted as MS Word files—not as files from any other word processor or as PDFs. This will facilitate us giving you detailed assignment feedback. For several assignments you will have the opportunity to submit them twice. The first submission will be graded on effort; the second submission will be graded on correctness after we go over what students had difficulty in the first submission in an outside-class-time online help session. (There will also be comments particular to you when we return your first submission.)
·
Book and other Reports: You
must use this form
to do your report. Need to write reports on:
o
Calculated Risks: How To Know When Numbers
Deceive You by Gerd Gigerenzer
o
Those who have done reports for one of
these books in one my earlier classes should do a report instead on either of
these two books:
§ Thinking,
Fast and Slow by Daniel Kahnerman
§ The
Black Swan: Second Edition: The Impact of the Highly Improbable by Nassim Nicholas Taleb.
§ Against
the Gods:
The Remarkable Story of Risk by Peter L. Bernstein
o
Other short reports may be assigned throughout
the semester counted as homework. For these you don't need to
use the book report form .
·
Grade: Your
course grade will be computed as follows:
Percent of Grade |
Activity |
25% |
Midterm exam |
30% |
Final exam |
10% |
Book report on Calculated Risk |
10% |
Book report on Uncontrolled |
20% |
Homework |
5% |
Surveys (These will be conducted
after every class, primarily to find out what needs more explanation.) |
100% |
Course Score |
·
Grades will be computed using the
usual 90+ A and so on.
·
Course schedule: To be provided.
·
Attendance: Preferably, you should listen to the lectures in real time so that
you can interact with me and other students. In cases where this is impossible
you must listen to the lecture recordings available on Blackboard. Blackboard
Collaborate will tell me who watched the lectures either in real-time or via
their recording.
·
Disability: If you need course adaptations or accommodations because of a
documented disability or if you have emergency information to share, please
contact the Office of Disability Services at 191 Hoskins Library at 974-6087. This will
ensure that you are properly registered for services.
1.
Bayes Analysis of Mammogram
·
Should women in their forties get
monogram?
·
Reasons for recent doctor recommendation
that they should not.
o
Breast cancer rate among these women
o
Selectivity and specificity of
mammograms.
·
P(Cancer| Positive mammogram) as a
function of the base
rate of cancer in the population of interest
o
Plot of this probability versus the
base rate
·
Connection to Bayesian inference in
general
2.
What Is Statistics?
·
Drawing conclusions from data
·
Reasoning under uncertainty
·
Variation
3.
Describing Data
·
Bar charts
·
Histograms
·
Mean, variance and standard deviation
·
Median and IQR
·
Five-number summary
·
Box plots
o
Calculation and construction
o
Comparing distributions using them
4.
Reliability Data
·
Characteristics of reliability data
o
Right censoring
o
Left censoring
o
Interval censoring
o
Truncation
·
Entering reliability data in JMP
·
Life tests versus inspection data
·
Reliability data examples
o
Ball bearings
o
Integrated circuits
o
Shock absorbers: multiple failure
modes
o
Heat exchangers
o
Turbine wheels inspections data
o
Circuit pack truncated data resulting
from factory burn in
·
Estimates of the cumulative
distribution function
o
Empirical distribution function
o
Kaplan-Meier estimate of the
distribution function when one has right censoring
5.
Opinion Polls and Confidence Intervals
for Proportions
·
Margin of error
·
Populations versus samples
·
Random samples
·
Definition of confidence based on
repeated random samples
o
Simulation
·
Confidence intervals for proportions
o
Formula
o
Heuristic derivation of this formula
·
How JMP calculates confidence
intervals
·
Conservative calculation of ME used in
polls where p is assumed to be equal
to 0.5 regardless of the sample proportions
·
ME as a function of sample size
o
Calculation of sample size for a
desired ME
·
Non-effect of population size on the
ME if the random sample is less than 10% of the population.
o
Justification using finite sample
correction
6.
Bootstrap Confidence Intervals for
Proportion
·
Concepts
·
Calculating them using JMP
·
Contrast between normal theory
confidence intervals and the bootstrap ones
7.
Normal distribution
·
Density function
·
Cumulative distribution function
·
Population mean and variance
·
Empirical (68-95-99.7) rule
·
Standardization
·
Calculations of probabilities based on
online applet
·
Calculation of percentiles and vice
versa using applet
·
Normal probability plots
·
Central Limit Theorem
·
Simulation of normal random variables
8.
Distributions in General
·
Random variables
·
Describing distributions
o
Cumulative distribution function
o
Density function and probability mass
function
o
Survival function
o
Quantiles and percentiles
o
Hazard rate and its interpretation
9.
Concepts of interest for reliability
and maintainability engineers
·
Bathtub curve of hazard rate: infant
mortality, useful life and wear-out phases
·
Equivalence of mortality rates and
(hazard) failure rates.
·
Social security life tables and
mortality rates
·
B10
·
Problems with using the mean when
there are right censoring.
10.
Distributions of Particular Interest
for Reliability and Maintainability Engineers
·
Exponential
o
Memory-less property and its
interpretation in terms of wear
o
Calculation of its mean and variance using
symbolic mathematics via Wolfram Alpha.
·
Weibull
o
Interpretation of alpha and beta
parameters
·
Lognormal
·
How the Weibull and the lognormal
compare
·
Gamma
·
Simulation
11.
Analyzing Reliability Data
·
JMP’s Life Distribution and
Reliability platforms
·
Heuristic interpretation of JMP output
·
Probability plotting to identify best
distributions
·
Individual versus simultaneous
confidence bands
·
Multiple modes of failure: competing
risks
12.
Maximum Likelihood Estimation
·
Binomial case with heuristic
interpretation
·
Review of MLH asymptotic theory and
associated formulas
·
MLE for the exponential distribution
with right censoring using asymptotic theory
13.
Basic Probability
·
Terminology of randomness
·
Law of large numbers
·
Nonexistence of “Law of
Averages”
·
Types of probability
o
Classical probability based on
symmetry
o
Frequentist
o
Personal
o
Axiomatic
·
Probability theorems
o
Venn diagrams
o
Conditional probability
o
Independence
o
Law of Total Probability and its
derivation
o
Bayes rule and its derivation
14.
Bayesian inference
·
Concepts
o
Priors
o
Using gambling odds for the
elicitation of the prior
o
Likelihood
o
Posteriors
o
Credibility intervals
o
Prediction
·
Bayesian updating: first principle
calculations
o
Mammogram data revisited
o
Coin tossing with a three point prior
(mass at 0, 0.5 and 1)
o
Oranges and apples
·
Bayesian updating: Conjugate priors
o
Family of life distributions having
conjugate priors
o
Bayesian inference for proportion
using the Beta conjugate prior
·
Bayesian updating: Markov chain Monte Carlo (MCMC)
o
Rejection-acceptance algorithm:
mechanics, software, heuristics interpretations, and derivation
o
Application in reliability engineering
when the engineers has information about the hazard rates
·
Informative versus non-informative
priors
·
Bayesian network diagrams
15.
Comparison of the Normal Theory and Bootstrap
Confidence Intervals with Credibility Intervals
·
Review of how confidence intervals are
defined
·
Review of credibility intervals are
defined
·
Population proportion case
·
Numerical examples
16.
Concept in the Testing of Hypotheses
·
Null vs. alternative hypotheses
·
Reasoning used in testing
·
P-values and test of significance
o
Examples
o
Higgs Boson and what do physicists
mean when they say that they have 5 sigma evidence for its existence
·
More concepts important in testing
o
Statistical significance
o
Practical significance
o
Type I and II errors
o
Alpha
o
Beta
o
Tension between alpha and beta.
o
Power and OC curves
o
What affects power: effect size, alpha
level and sample size
o
Determining the sample size for a
given alpha, important effect size and corresponding beta
·
Relationship between two-sided tests
and confidence intervals
·
Chi-square test
17.
Standard Deviation as a Ruler
·
Changes in location and scale and
standardization
o
Celsius versus Fahrenheit scales
·
z-scores
·
When is a z-score big?
·
Empirical rule revisited
18.
Classical Normal Theory confidence
intervals
·
Joint derivation based on parameter
estimates and standard errors
·
SD deviation as ruler revisited
o
Standard errors
·
List of parameters involved
o
Proportions
o
Difference of proportions
o
Means
o
Difference of means: independent
samples
o
Difference of means: paired samples
·
Examples of the problems that these
confidence intervals address
·
Assumptions
o
Randomization
o
Independence
o
10% condition
o
Independence
·
Calculation of them using JMP
o
Data entry
o
Interpretation of JMP output in context
19.
Simple Regression
·
Logistic regression
·
Simple linear regression
o
Assumptions and how to check them
o
Outliers and high leverage points
·
Interpretation of JMP output
·
Regression wisdom
20.
Regression with two independent variables
·
Continuous regressors
·
Categorical regressors and dummy
variables
·
Effect of a variable after adjusting
for the effect of another variable.
·
Interactions
o
Interaction in the context of one
categorical and one continuous regressor
o
Analysis of covariance
o
Interactions when one has two
continuous regressors
·
Multicollinearity
·
Example of general multivariate
regression
o
Brief discussion of JMP output
21.
Further Topics in Regressions:
Highlights
·
Generalized Linear Models
o
Framework
o
Link function
o
JMP Generalized linear model platform
·
Weibull regression with right-censored
data
o
Model
o
Accelerated life testing with
temperature as a regressor
o
JMP output interpretation on the basis
on an example
·
Poisson regression for counts and
rates
o
Model,
o
Over-dispersion
o
Excessive numbers of zeros
o
JMP output interpretation on the basis
on an example
·
Cox proportional hazards model
o
Model
o
JMP output interpretation on the basis
on an example
In addition, to the lectures on the
scheduled material, there will be occasional enrichment lectures motivated by
students’ interests.