Website Traffic Tracking

Do you have a website? If so please go to the place you store the access logs, and check how much disk space they use. Having a website a few yours old, you’re probably looking at gigabytes, and what exactly is the value of that?

Sure keeping track of traffic levels is sort of interesting, but sometimes you need to balance the value provided by the space/resources required, and I’ve been slowly changing the way I use the access logs on this site.

Step 1: Don’t track the images

Do you really need to track, which images downloaded from the site, or would it be enough to know which pages are loaded? – For my part page impressions is enough intelligence on the site traffic and with apache it’s easy to disable image tracking. The easy way to do it is by adding a parameter to your log configuration saying:

env=!object_is_image

Restart the webserver and the log file should be somewhat smaller from now on.

Step 2: Use Awstats

My next step was to use Awstats. It parses the raw accesslog data into a database-file, which is significantly smaller than the raw files themselves. Awstats is a lot like other access-log analyzing packages, but it seemed to be just a notch above the rest.

Step 3: Drop the access logs for long term intelligence

While access logs on the webserver may be the source for traffic intelligence, there are several options to track traffic through remote services.

Most of them are pretty good and if you’re interested in generic analytics, you should probably look at one of the many options available to do traffic tracking as a remote service.

Some of the options available include Google Analytics (which I use), StatCounter and several others. Isn’t it nice, that someone else offer to keep all those historic data online – and in many cases absolutely free.

I still have access logs, but they’re used to (1) validate data from Google Analytics and (2) keep an eye on what’s happening on the site “now”. Any data more than a week (or so) only exist at Google Analytics…