Wednesday, 17 May 2006
Web 2.0 Traffic Breakdown: Machines vs. Humans 
« How I Learned to Love Tuning | Main | Discovering Feed Agents »
While improving Simpy's performance over the last few weeks (see: How I Learned to Love Cache and How I Learned to Love Tuning), I had to dig into Simpy's log files, various metrics that I collect, analyze various performance and system statistics charts, etc. While I always knew there are lots of web robots (aka crawlers or spiders) out there (after all, Simpy has its own Argus robot), I didn't realize what portion of web requests is generated by them. Thus, I was quite surprised to find that over 70% of web requests are generated by bots! 70%! That is a lot more than the nearly 15% that is generated by various feed aggregators. This leaves only 11-12% of web page requests that are generated by real people! Of course, these are Simpy's numbers. Other sites will have different numbers.
If you are interested in how web server traffic can be broken down like this, see my earlier post titled Crawlers and Aggregators and Apache Logging Tricks. I will expand the list of bots and aggregators listed there in the near future, if readers ask for it.
Conclusions:
- Breaking bot vs. aggregator vs. human-generated traffic is useful.
- Breaking requests to dynamic vs. static pages is useful (but I did not show that here).
- When tasked with figuring out why your site is slow, suspect machines first, humans second.
Notes:
- Web requests analyzed here should not be mixed with the notion of "Page Views". When I refer to "Page Views", I refer only to requests made by humans, and this excludes requests to "resource files", such as JavaScript, image, CSS, and other static content. My guess is that not all people are this strict and clear, and often count machine-made requests towards "Page Views".
- Hm, I thought I had another note. Perhaps not.
Posted by at 1:46 PM in /
Comments (0) Trackbacks (0)
Simpify!
Technorati Tags: simpy performance scaling traffic web2.0 bots robots crawlers spiders aggregators
