Mary Hodder just wrote a cool post on using blog link indexing and other metrics to help discover quality news.
This is a topic dear to my heart as I’ve been working on this problem in one form or another (Reptile, NewsMonster, Rojo) for about five years now.
A discussion about creating a new metric for understanding blogs is something I think the community should have the chance to participate in to find a different way of perceiving a blog, or the ripples a blog makes. Partly I believe this because of the frustration people express about Google’s secret algorithm for pagerank, where they feel something this powerful should not be secret.
One problem Google has (and I’ve experienced as well) is that even if you do come up with a pristine algorithm there are lots of real world ways to violate your assumptions. There are four main reasons that I can see which prevent open discussion of the details of complex ranking/reputation systems.
- The technology is often complicated and not many people are going to be able to give you meaningful feedback.
- Some of the technical problems are amazingly difficult and uplevel changes to the algorithm which are themselves patentable (see issue #1). If you want to follow this more just review Google’s patent portfolio.
- Some of the changes won’t work if published. I hate to argue for security through obscurity here but it’s a technique that often works.
- The algorithm is often the easy part. Scaling the algorithm into a cluster of machines that can handle sever load is often the hardest part.
I think this helps describe Google’s current state of affairs. They’ve done a good job at solving all of these problems to date and it’s why they’re a leader in search. That’s not to underestimate Yahoo, Overture or Microsoft.
Currently, blogs are measured in systems like Technorati or ranked in PubSub by links or by number of subscribers to a feed in Feedster. In particular, these are the not very interesting, subtle or telling measures used to make indexes like the Technorati Top 100 or the PubSub 100 or the Feedster 100.
Often simple metrics like this don’t really work in practice. The Blog 500 $50k prize is evidence enough that we need change here.
One problem with the Technorati Top 100 is that there’s no current rate of change. There are some other systems which provide this data of course (daypop is an example). Rate of change is highly important because while BlogA could have a high position in then ranking BlogB (which is a competitor) could be catching up rather quick.
Links alone are not a good metric for authority. There are several reasons for this. But the most important, I think is that as consequence for the blogosphere, it harms the way people see blogging. People know some bloggers want influence; many bloggers know they want it too, though many others don’t want it at all.
Flat link indexing is pretty easy for an engineer to implement. My hunch is that Technorati literally has a data structure which only stores the number of inbound links. This is easy to implement but falls down after a while. For example Boing Boing is at the top of the Technorati Top 100. Do you think they’re ever going to change? Not likely. They have so many links behind them that at this point we’re stuck.
Part of what we want is a rich user generated ontology resulting in topic groups that is constantly adjusting to find what’s delightful, useful, interesting across blogs. And a more complex metric for understanding those topic groups and individual users as they blog memes and interact with each other, with some context around those bloggers, would help quite a bit.
I like the term topic groups
because it doesn’t imply a specific technology. I think tags work great for group isolation but these haven’t really taken off too much for post popularity indexing.
… And right now, the Technorati Top 100 list is obtuse enough that we can all agree that it’s not useful for judging 14 million blogs, because blogs are as different as their authors …
Exactly! The problem is that this suffers from the Britney Spears problem
where blogs only rise to the top if they’re popular to everyone equally. This just ends up becoming a highschool popularity contest.
So the tension is, do we in the blogosphere figure out a more sophisticated, open standard based metric that reflects the way we see blogs, within and across communities, in order to score blogs? And do we do this within topic areas? Or does using a more sophisticated algorithm across all blogs make more sense? Or do we allow this all to be done for us, possibly in an opaque way by some of the blog search engines or by people who are trying to figure out blogger influence and communities for their clients, or do we write off those efforts because we know they cannot possibly understand us anyway?
My thinking is that we should write up how an ideal metric would recommend posts. The algorithm implementation problem is another discussion as this can be implemented in many different ways. The devil is in the details I’m afraid.
You’re suggestions are great. It might be a great idea to start a wiki page and discuss this at length. The one area that makes me nervous though is actually implementing the algorithm. There are a lot of problems here including scalability, spam prevention, runtime speed, etc. These alone deserve more research.
Another point I want to make is that this is somewhat related to the attention issue. Some of your data isn’t actually available to the public (subscription information is a good example).












5 Replies
“The Blog 500” Challenge: Some thoughts on Jason’s metrics
Jason Calacanis recently wrote I’m sick of the Technorati 100 and has offered a prize of $50K advertising credit or $10K cash for the first person who can offer a better ranking list that addresses his many concerns with the
More comments on…
a community based algorithm and the attendant issues… Michael Frasse on Information authority and ranking: Hodder says, rightly, that the metric for assessing weight in the blogosphere should be open, not closed. “Bloggers should have input about the…
More comments on…
a community based algorithm and the attendant issues… Michael Frasse on Information authority and ranking: Hodder says, rightly, that the metric for assessing weight in the blogosphere should be open, not closed. “Bloggers should have input about the…
Cloudmakers R Us
In response to Mary’s excellent series of postings that wrap up a lot of fine thinking about illustrating blog relationships, Adina writes at BookBlog: The cloud would be a picture of a conversation surrounding a person or a topic….