Wednesday, 19 October 2005
Web 2.0 Tagging Study, Part 4: Nocturnal Activity Pattern 
In the Timing of Read vs. Write study we saw that there is a slight difference in when Simpy users Write (save and tag new links) and when they Read (search for previously saved and tagged links). In addition to this difference in timing, I've been monitoring Simpy's web access logs and have always found it interesting how its access pattern differs so much from web access patterns I was used to seeing on other sites. This particular study is not about tagging per se, but I kept "Tagging" in the title for the sake of consistency.
First, let's look at the access pattern of a simple, static web site. In this case, the data belongs to Lucene Consulting site. The following image shows the self-explanatory Weekday-Hour map, where a lighter colour indicates a higher number of web page views.
Clearly, the number of web accesses is pretty evenly distributed throughout the day and throughout the week, with Saturday and Sunday bringing in less visitors. I think this is a pattern you will see on most web sites. On sites targeted at the business crowd, or any audience that uses those sites for work, the curve between 09:00 and 17:00 will be much more pronounced, and will look a bit like a bell curve. Also notice a slightly higher number of visits around 22-23h (the time is in EDT, +0600). Here is another chart showing the same data in a more conventional format:
How about a blog site? What usage patterns can be seen there? For that, let's look at data for this very blog:
Same data in an alternative chart:
What do we see? We see a very different usage pattern. There is no 09:00-17:00 EDT curve whatsoever, there is no 22:00-23:00 EDT bump we saw on the previous set of charts, but there is a major spike between 00:00 and 01:00 AM, and there is a significantly increased nocturnal activity after 00:00 AM EDT, until around 04:00-05:00 AM EDT. What could be the cause of this? From what I have observed in the logs, this activity comes from FeedFetcher-Google, Onfolio, Bloglines, NewsGator, Rojo, kinjabot, other blog aggregators that, like bats, seem to come alive and start feeding at night. In my experience, this pattern is not limited to blog sites. I see the exact same pattern with hyperactive nocturnal bots in Simpy itself. It's no wonder, there are lots of pages to be fetched from Simpy, and there are thousands of RSS, RDF, and Atom feeds to be fetched. I wish major blog aggregators published this type of data, because what I would really like to see is the human usage pattern, not that of bots. Anyone?
