Weijie Cai Li Li
We made two proposals. The first one was about simulating
worm propagation in a LAN. We were afraid of our project would be
so excellent that hackers or crackers may utilize our work, then we gave
up. It's not because that Dr.Wegman turned down our proposal, of
course. Then we had ideas from a series of papers by Taylor, C. and
formed our second project proposal: cluster
analysis on anomaly detection. One additional lesson we learned
from the proposal writing is that, NEVER ever write your proposal in a
detailed way, you will find many things unsolvable sooner or later.
See our project log. (updated every day and
written in a free style and TERRIBLE english)
The bash shell script to extract TCP header
information.
The extracted data:
Wednesday (1st and 3rd week
combined)
Thursday (1st and 3rd week
combined)
A little program to get the total bytes (this
function is not provided by tcpdump, although tcpdump could give the
data length of ip headers but it could only be used as a rough estimate
for the tcpdump total bytes) and total length from a tcpdump file.
A SAS program to process the Tcp connection
raw data with around 1450000 obsevations. We have to admit SAS really
beats other statistical softwares at such huge data. R or S-Plus could
not even read it into memory, while SAS could easily read in within
several seconds! Check SAS log about this.
Then we find in Wednesday of first week's tcp connections, there are
around 2000 independent sessions, we just
get 500 out of them using simple random
sampling. Go back to Linux again, we dumped the outside.tcpdump and get
the counts of flags and packets.
And finally, our project paper goes here... and our
project slides goes here...
seminar version.