Spring, 2003
Prerequsites for this course include a course in applied statistics and a course in statistical inference.
The text for the course is
Introduction to Robust Estimation and Hypothesis Testing,
by Rand R. Wilcox (1997).
The most important text in the area remains
Robust Statistics,
by Peter J. Huber (1981).
We will also use some journal articles, particularly ones selected
by the students for their projects, and an evolving set of notes
by the instructor.
You must have an account on a system that has a web server. The CSI system is scs.gmu.edu. There are several other possibilities, including the university systems mason.gmu.edu and osf1.gmu.edu, and systems in IT&E. If you do not have an account yet, you can get one on scs.gmu.edu by filling out a request form that you can get from the SCS office in 103 Science & Technology I.
The scs.gmu.edu system requires a secure login (ssh) and secure ftp. You can get information about the system and options for accessing it at www.scs.gmu.edu/computing/
Here's a source of utility freeware, including programs for ssh.
Here's info on getting an account on the main GMU computers.
Each student will
prepare a Web page
for presentation of
the project and for some of the smaller assignments.
Here's more info on making a webpage, especially on GMU computers.
There are several programs that help you write html. I do not use any of these but you may find them useful. You can also produce html output directly from Microsoft Word. I do not use that for html either. (In fact, I use Word as infrequently as possible.)
You are strongly encouraged to prepare your written reports relating to you project using TeX or LaTeX. This is a typesetting program that is available on a number of GMU computing systems and is widely available on PCs. Information about TeX can be obtained at the TeX Users' Group. It is best to put material on the web in PDF format, which can be generated easily from TeX.
The main software used in the course will be S-Plus or R.
A student version of S-Plus can be obtained at
http://elms03.e-academy.com/splus/
Information about R, including links for downloading, can be obtained at
http://www.r-project.org/
Wilcox has developed a number of S-Plus or R functions to implement
various methods he discusses in the text. He has revised these
functions, and plain text versions of
them can be downloaded from
his site.
Also Dallas had downloaded the original set (called "allfun")
and, with Rand Wilcox's permission, I have put them
here.
Role of models in statistical inference; basic definitions of robustness.
Failures of model assumptions.
Effects: bias, variance, loss of power.
Principles of estimation.
Functionals / plug-in estimators;
Robust parameters / robust procedures.
Types of robustness: qualitative (continuity); quantitative (breakdown);
infinitesimal (influence function).
Measures of location (of a distribution in Wilcox, or a sample)
must be equivariant for linear transformations.
Types:
means
quantiles
Winsorized means
trimmed means
M-measures (rho, psi)
R-measures (functionals involving q and P(f(x_q)))
L-measures (linear combinations of order statistics)
Measures of scale (of a distribution in Wilcox, or a sample)
must be equivariant for scale transformations and invariant for
sign transformations and additive transformations.
Types:
mean squared deviation from the mean
mean absolute deviation from the mean
mean absolute deviation from the median
median absolute deviation from the median
Scale equivariant M-measures of location.
Methods for estimation of the standard deviations (standard "errors") of the
of the estimators. Four ways:
1) represent variance as a function of the influence function;
then estimate it;
2) use the relationship of the CDF of order statistics to the beta distribution;
3) use the actual variances of normal order statistics;
4) use resampling methods.
Pay particular attention to the approximation for the variance of the trimmed
mean, equation (3.4), and an estimator of it, equation (3.8).
Two robust scale estimators: biweight midvariance and the percentage bend variance.
Assignment: Get Wilcox's S-Plus functions from
his website.
Work problems 2, 3, 4, 6, 7 in Chapter 4.
(For many of the problems you should just write the S-Plus code yourself,
but you can use his functions when it's convenient.)
These problems as well as those from Chapter 3 will be due
March 19
Review linear classification models.
Use of trimmed means in a one-way layout.
Robust methods in classification models of additive linear effects.
Heteroscedasticity problems.
The Behrens-Fisher type of problem gets even worse as the number
of groups increases.
The problem is that the nominal level of the F test is exceeded,
and the problem is worse for unequal sample sizes.
Also, the test is not unbiased for the hypothesis of the mean.
What about using the F test just as a test of equality of distributions?
This is changing the problem --
also, in any event, the F test has low power when the distributions have
heavy tails.
Characteristics of what Wilcox calls a "heteroscedastic method".
* variances are not pooled
(Welch, 1933, 1951, Satterthwaite, 1946)
* degrees of freedom are adjusted
(Welch, 1933, 1951, Satterthwaite, 1946, Yuen, 1974)
* scale or variance measures do not include extreme order statistics
(Yuen, 1974)
this also probably makes the procedure more robust to other things
"effective sample size"
Use of trimmed means in two-sample, one-way, multi-way layouts.
-- "effective sample size"
* the Yuen-Welch ideas are similar throughout.
* for one-way, the generalized "Box" method
(Lix, Keselman, and Carriere, 1996)
very similar to the Yuen-Welch method
The basic idea is to use trimmed means in the "between sums of squares"
and to use Winsorized data in the "within sums of squares".
What about skewness?
percentile t bootstrap method
Contrasts.
Definitions; why important.
Form with Kronecker products.
M-estimators -- use percentile bootstrap
Tests of medians based on Harrell-Davis estimator
-- use percentile bootstrap
Rank-based procedures -- extension of Kruskal-Wallis test.
(Rust and Fligner, 1984)
Random effects model.
Deal with heteroscedasticity problems same as before --
first, do not pool variances;
second, adjust degrees of freedom
(Jeyaratnam and Othman, 1985)
next, use trimmed means (Wilcox, 1994)
Other variance structures, such as those arising from
repeated measures, split plot designs, other dependencies,
require similar kinds of approaches.