Tuesday, 23 May 2006

Identifying And Counting Feed Agents

A reader made a good suggestion in the comments of the Discovering Feed Agents post. I took that a bit further:
grep 'FOO' /log/apache/aggregator_log | cut -d' ' -f12 | sed 's/"//g' | sort | uniq -c | sort -nr | head -20
This yields a list of the feed agents that most often hit your blog. Here are top 20 for this blog:
   2685 Mozilla/5.0
   2115 Bloglines/3.0-rho
   1163 Sphere
   1094 Feedfetcher-Google;
    812 Mozilla/4.0
    288 Java/1.5.0_06
    286 WinkTagBot/1.0
    280 YahooFeedSeeker/2.0
    266 NewsGatorOnline/2.0
    168 MagpieRSS/0.79a
    110 PluckFeedCrawler/2.0
    102 MagpieRSS/0.72
     96 Tagyu/1.1
     96 Fotki
     88 Pageflakes/1.0
     80 FeedBurner/1.0
     77 AttensaOnline/1.0
     57 FeedBlendr.com
     56 ping.blo.gs/2.0
     53 edgeio-retriever
Posted by otis at 11:56 PM in Tips & Tricks

Announcement: Simpy Firefox Extension

Alexandru has been keeping himself busy lately. He recently developed the very first Simpy Firefox Extension! I use it myself now (on Firefox 1.5.0.3)! Please download it and send Alexandru your feedback and suggestions. The extension currently requires that you add its buttons to the Firefox toolbar manually, via View > Toolbars > Customize menu. To help with that, Alexandru created 2 short videos to show how to configure and how to use the extension.
The above screenshot shows the extension buttons in the Firefox toolbar, but there is more to this extension. Right click on any web page, and you will get a context menu with options to add the current page to your Simpy account, an option to check Trend/History for the current page, etc. Here are more screenshots. This is excellent work, thank you Alexandru!
Posted by otis at 1:59 AM in News & Announcements

Friday, 19 May 2006

Discovering Feed Agents

If you read my previous post about Men vs. Machines, and are interested in that kind of stuff, try this with your web server logs:
grep 'FOO' /log/apache/access_log | cut -d' ' -f12 | sort | less
Of course, "FOO" is just a placeholder. Use the portion of the URL that marks your feed (e.g. on Simpy it would be "/rss", because all feed URLs start with "/rss").
The above will give you a full list of feed agents. If you want to see a list of unique agent names, use this:
grep 'FOO' /log/apache/access_log | cut -d' ' -f12 | sort | uniq | less
What is this list good for? It's good for figuring out feed agent names, so you can redirect their request to different log file, for instance. It's also good for discovering new feed-eating services and software out there. Simpy currently distinguishes 30+ different feed readers.
Posted by otis at 12:37 PM in Tips & Tricks

Wednesday, 17 May 2006

Web 2.0 Traffic Breakdown: Machines vs. Humans

While improving Simpy's performance over the last few weeks (see: How I Learned to Love Cache and How I Learned to Love Tuning), I had to dig into Simpy's log files, various metrics that I collect, analyze various performance and system statistics charts, etc. While I always knew there are lots of web robots (aka crawlers or spiders) out there (after all, Simpy has its own Argus robot), I didn't realize what portion of web requests is generated by them. Thus, I was quite surprised to find that over 70% of web requests are generated by bots! 70%! That is a lot more than the nearly 15% that is generated by various feed aggregators. This leaves only 11-12% of web page requests that are generated by real people! Of course, these are Simpy's numbers. Other sites will have different numbers.
If you are interested in how web server traffic can be broken down like this, see my earlier post titled Crawlers and Aggregators and Apache Logging Tricks. I will expand the list of bots and aggregators listed there in the near future, if readers ask for it.
Conclusions:
  • Breaking bot vs. aggregator vs. human-generated traffic is useful.
  • Breaking requests to dynamic vs. static pages is useful (but I did not show that here).
  • When tasked with figuring out why your site is slow, suspect machines first, humans second.
Notes:
  • Web requests analyzed here should not be mixed with the notion of "Page Views". When I refer to "Page Views", I refer only to requests made by humans, and this excludes requests to "resource files", such as JavaScript, image, CSS, and other static content. My guess is that not all people are this strict and clear, and often count machine-made requests towards "Page Views".
  • Hm, I thought I had another note. Perhaps not.
Posted by otis at 1:46 PM in /

Thursday, 11 May 2006

How I Learned to Love Tuning

Exactly a week ago I posted about the recent performance improvements I made with Simpy. That's the green ellipse in the picture on the lft. People noticed the improved page load times and sent nice email. Needless to say, I was pretty happy. Then, all of a sudden, things again went down hill. That's the red circle in the picture. I starred at the database server for a while, and finally found a veerrry sloooowwwww querrrry. It turned out the query was not using the index it should have been making use of. Why not? Thanks to the fast and superb support found in the open-source community, I got help from database experts who quickly pointed me in the right direction. It turned out some of the statistics that the database server normally gathers was off by a large number. After I made the appropriate change, the database server load decreased by several orders of magnitude, as you can see in the picture below. The sudden drop in the chart represents the sudden drop of the load on the server. Of course, this nirvana won't last forever, and there will be new hurdles to jump over. That's just the nature of the beast. Until then, I'll be enjoying the speedier Simpy. As always, if you notice consistent slowness on any portion of the site, please let me know.
Posted by otis at 5:06 PM in /

Monday, 8 May 2006

iPod Ecosystem: iBuzz - musical orgasm machine

While some are studying Google and the economic ecosystem created around it, others are enjoying the ecosystem creating by Apple's iPod. I don't own an iPod, but I've seen numerous iPod add-ons - iPod base, wireless speakers, an FM radio attachment, etc. But there is an add-on I have not seen before today - iBuzz! Its manufacturer describes it as: "iBuzz is the musical orgasm machine! The music-activated vibrating bullet stimulates you in time with your favourite music. Which song pushes your butttons?"
He he he, is that crazy or what! Quickly, where is my credit card!
Posted by otis at 5:24 PM in /

Thursday, 4 May 2006

How I Learned to Love Cache

Simpy just got 27 new servers! Heh, no, not really. I wish! Instead of new hardware, the existing one was tuned a little to improve performance. Has anyone other than my mom noticed it??? To the faithful Simpy users - thank you for dealing with the recent sluggish performance and sticking around! While May 1st is a Labour Day (typically converted into a full week) in some countries, I couldn't rest until I took care of the sudden unbearable slowness of .... serving pages, bookmarks, tags, and other goods. The following chart illustrates what I accomplished that day (or was it night?).
What the chart depicts is the sudden drop in the server load on one of Simpy's serviers, as the result of my fixes. The X axis represents time (more recent to the right), and the Y axis represents server load.
Finally, two serious questions:
  1. Has anyone noticed the improved performance? (I only get "love letters" when things are slow)
  2. Are any other pages still loading slowly for you? Let me know which ones!
Posted by otis at 2:20 AM in News & Announcements

God Does Simpy

You won't believe it, but god does use Simpy. Sceptical? Don't believe me? Have a look for yourself:
And god is not alone here! Jesus is here, too, but he doesn't seem to want to share his links, although he's a pretty active user!
Posted by otis at 1:45 AM in /

Saving Google Search Results

One of the handy Firefox extensions I've had for years is Customize Google. The extension has numerous options, from hiding AdWords on Google's search results, adding links to alternative search engines on Google's SERPs, to allowing you to save individual links from Google's SERP directly into your Simpy account, like this:
How do you make that happen? Here is a screenshot that shows how to configure Customize Google to add that "Save" link for bookmarking directly from Google's SERPs:
Give Customize Google a try (it uninstalls just as easy as any other Firefox extension), it's quite versatile and powerful.
Posted by otis at 1:22 AM in News & Announcements
« May »
SunMonTueWedThuFriSat
 123456
78910111213
14151617181920
21222324252627
28293031   
       

Powered by blojsom