STAT753 Intrusion Detection Course Project Page
Weijie Cai    Li Li

We made two proposals. The first one was about simulating worm propagation in a LAN. We were afraid of our project would be so excellent that hackers or crackers may utilize our work, then we gave up. It's not because that Dr.Wegman turned down our proposal, of course. Then we had ideas from a series of papers by Taylor, C. and formed our second project proposal: cluster analysis on anomaly detection. One additional lesson we learned from the proposal writing is that, NEVER ever write your proposal in a detailed way, you will find many things unsolvable sooner or later.

See our project log. (updated every day and written in a free style and TERRIBLE english)
The bash shell script to extract TCP header information.
The extracted data:
    Wednesday (1st and 3rd week combined)
    Thursday (1st and 3rd week combined)
A little program to get the total bytes (this function is not provided by tcpdump, although tcpdump could give the data length of ip headers but it could only be used as a rough estimate for the tcpdump total bytes) and total length from a tcpdump file.
A SAS program to process the Tcp connection raw data with around 1450000 obsevations. We have to admit SAS really beats other statistical softwares at such huge data. R or S-Plus could not even read it into memory, while SAS could easily read in within several seconds! Check SAS log about this.
Then we find in Wednesday of first week's tcp connections, there are around 2000 independent sessions, we just get 500 out of them using simple random sampling. Go back to Linux again, we dumped the outside.tcpdump and get the counts of flags and packets.


And finally, our project paper goes here... and our project slides goes here...

seminar version.