January 20, 2004

Categorizing Blogs?

by skrenta at 8:17 AM

Topix.net currently includes a handful of blogs that we've editorially selected as being newsy enough to fit with the other material we have. This includes stuff like Dan Gillmor's eJournal, Lawrence Lessig, TechDirt, Executive Summary, and so on.

Originally we thought we'd have the full blogiverse in the categorization engine. We actually had it running internally for a while. But not much was coming out of the blogs, and the stuff that did didn't look very good compared to the other stories.

Our categorizer likes references to very specific named entities -- at the local level, streets, jails, hospitals, parks, rivers, and so on. For national news we're scanning for politician names; for world news, references to political leaders and geographical features. But the AI wasn't finding much to grab onto in the blogs we crawled.

I guess this is because a lot of blog material is informal discussion that is often more follow-up to news posted elsewhere than direct reporting. Blog posts are the feedback to the story, not the story itself. And the chatty tone often omits the who-what-where-when-why of a news story or press release that makes them self-contained entities.

Perhaps blogs need a system like Dave Winer's categorization scheme, although, despite my ODP background, I'm skeptical of ad-hoc user generated taxonomies. Or maybe the rumors that a new system like Kinja will make sense of it all will pan out.