ClarkNet-HTTP

Description
These two traces contain two week's worth of all HTTP requests to the ClarkNet WWW server. ClarkNet is a full Internet access provider for the Metro Baltimore-Washington DC area.
Format
The logs are an ASCII file with one line per request, with the following columns:
  1. host making the request. A hostname when possible, otherwise the Internet address if the name could not be looked up.
  2. timestamp in the format "DAY MON DD HH:MM:SS YYYY", where DAY is the day of the week, MON is the name of the month, DD is the day of the month, HH:MM:SS is the time of day using a 24-hour clock, and YYYY is the year. The timezone is -0400.
  3. request given in quotes.
  4. HTTP reply code.
  5. bytes in the reply.
Measurement
The first log was collected from 00:00:00 August 28, 1995 through 23:59:59 September 3, 1995, a total of 7 days. The second log was collected from 00:00:00 September 4, 1995 through 23:59:59 September 10, 1995, a total of 7 days. In this two week period there were 3,328,587 requests. Timestamps have 1 second resolution.
Privacy
The logs fully preserve the originating host and HTTP request. Please do not however attempt any analysis beyond general traffic patterns.
Acknowledgements
The logs was collected by Stephen Balbach of ClarkNet, and contributed by Martin Arlitt (mfa126@cs.usask.ca) and Carey Williamson (carey@cs.usask.ca) of the University of Saskatchewan.
Publications
This is one of six data sets analyzed in an upcoming paper by
M. Arlitt and C. Williamson, entitled ``Web Server Workload Characterization: The Search for Invariants'', to appear in the proceedings of the 1996 ACM SIGMETRICS Conference on the Measurement and Modeling of Computer Systems, Philadelphia, PA, May 23-26, 1996. An extended version of this paper is available on-line; see also the DISCUS home page and the group's publications.
Related
Permission has been granted to make four of the six data sets discussed in ``Web Server Workload Characterization: The Search for Invariants'' available. The four data sets are: Calgary-HTTP , ClarkNet-HTTP , NASA-HTTP , and Saskatchewan-HTTP .
Restrictions
The traces may be freely redistributed.
Distribution
Available from the Archive in Aug 24 to Sep 03, ASCII format, 21.8 MB gzip compressed, 171.0 MB uncompressed, and Sep 04 to Sep 10, ASCII format, 20.8 MB gzip compressed, 172.5 MB uncompressed.


Up to Traces In The Internet Traffic Archive.