Andrew Sullivan commented Friday on a blog ranking scheme which shows his site to be the second "most influential" among political blogs. The scheme is interesting, especially as it has applications across a variety of domains.
As described by its author, the technique works as follows:
I went to Technorati, Daypop, Blogstreet, and the Truth Laid Bear Ecosystem on Tuesday and counted how many links went to the top 100 POLITICAL blogs listed. Then I went through and weeded out any blog that didn't make the top 100 on at least 3 of the 4 measuring tools.At that point, there were only 29 blogs left and I took their best 3 scores (or there only 3 scores if that was the case) and added them up. For example, if "Blog X" was the 3rd, 12th, 19th, & 26th, most linked to blog on the 4 top 100 pages I used, the 26th place finish would be dropped and "Blog X" would get a score of 34. Blogs with a score <34 would be ahead of "Blog X" and blogs with a score >34 would be ranked behind it.
This reminded SC of a couple of similar evaulation techniques. One favorite of his is Rob Neyer's "Beane Count", named for the general manager of the Oakland A's, Billy Beane. The Beane Count is simply the sum of a team's rankings in four categories: home runs scored/allowed and walks earned/allowed. A good team will hit a lot of homers and earn a lot of walks, without giving up too many of either; thus, the Beane Count is a proxy for a team's all-around ability to do the things that produce marginal runs, perhaps boosting them over the top. It's not a perfect indicator of a team's ability to win any one game -- last year's World Series champions, the Florida Marlins, were only 8th in the National League according to this system -- but it correlates well with who is in fact atop the standings (here's last year's Beane Count, along with the actual standings; it's too early to be useful for 2004).
A nifty application of this methodology to natural language processing is Manning and Klein's lexicalized, factored parser. It gets better results than a conventional statistically-trained parser by making use of both a probabilistic context-free grammar and a lexical dependency model, and then making inferences about which is right in each case.
Beane Counting can even be done in hardware. One technique popular in high-end digital audio electronics is to parallel a couple of digital-to-analog converters, and subtract the difference of their outputs from the analog signal that ultimately goes out the back panel. This cancels some of the random distortion specific to each converter, while retaining the common signal (which is presumably correct). Done correctly, each doubling of the number of D/A converters can improve the system's signal-to-noise ratio by about 3 dB, which is a meaningful improvement, but only when cost is low on your list of priorities.
Beane Counting techniques only really work if you've got a couple of individual models which are each pretty good to start with. While differences definitely exist among each of the blog ranking algorithms, and Beane Counting can smooth out the noise specific to any one of them, if one ranking algorithm erroneously put Andrew Sullivan all the way down at the bottom, the output of averaging three of them would still have Sullivan badly misranked. If Manning and Klein's parser was deciding between two completely erroneous parses, its performance wouldn't really be any better for having a slick inference engine at the end. And no electrical engineer would design a system with 16 8-bit D/A converters if he had access to one decent 16-bit converter instead. "Garbage in, garbage out" holds for every algorithm ever designed.
As a side note, it's important to separate evaluations of the Beane Count's utility from evaluations of Billy Beane's. While the Oakland A's have done some truly amazing things over the last 5 years, the man also said:
"I wasn't looking to trade Ramon. I've just loved Mark Kotsay for a long time, and (Padres GM) Kevin (Towers) knows I've loved Kotsay since he was at Cal State Fullerton."
SC is a Padres fan (which meant that he was obliged to cheer for Mr. Kotsay for several years), but the notion that the general manager of a winning baseball team could say "I love Mark Kotsay" and use that as justification for trading a quality catcher is grounds for firing.
Comments