CSI 991
Seminar in Computational Statistics:
Data Mining
Fall, 2003
Fridays 3:00pm -- 5:00pm, Innovation Hall, Room 209
The course is CSI 991, Section 004.
Contacts:
csutton@gmu.edu
jgentle@gmu.edu
We will work through some of the material in
Principles of Data Mining
by David J. Hand, Heikki Mannila, and Padhraic Smyth.
Schedule
- September 12
Introductory discussions.
- September 19
Seminar canceled because of storm.
- September 26
- Material from Chapters 1 and 2; presentation by Yasmin Said.
-
Binning and other topics by Dan Carr.
- October 3
Questions/disscussions for Chapters 2 through 4.
- October 10
- October 17
- October 24
- October 31
-
Hidden Markov models by David DeBarr.
-
Cross validation and more on CART by Cliff Sutton (see material on October 10)
- November 7
Chapter 8; general discussion.
- November 14
First, we consider some intro
material on the EM algorithm (Gentle)
Then we experiment with classification methods on the
county election results data, which is a modified (and tab-delimited!!!)
dataset from
Harrel's book.
The following changes were made to
Harrel's original dataset:
- Deleted the 27 counties that were missing the dependent variable
- Derived pdensity (log of population density)
- Derived senior (sum of age6574 and age75)
- Derived clinton (binary indicator set to 1 if democrat > republican and
democrat > perot; else set to zero)
- renamed pop.density popdens
- renamed pop.change popchan
- renamed republican republic
Jill showed how SPSS Clementine works on this dataset.
- November 21
Presentation on cluster analysis by Pragyansmita Nayak
Continuation of the study of the "counties" data, using CART and S-Plus.
- November 28
No seminar; Happy Thanksgiving!
- December 5
Chapter 10.
- December 12
Chapter 11.