September 9, 2005

Tagging and Unstructured Data

by tolles at 3:54 PM

There's been quite a bit of hype around tagging over ther past year, especially around putting structure around user created data (especially at Flickr and Technorati).

At the SES show in New York, I ranted about tagging and the fact that there has been little done to proactively deal with the obvious and inevitable problem with SPAM -- Web pages back in the mid nineties all had facilities to be tagged with meta data, and the first search engines attempted to utilize this functionality, and thus the beginning of search engine spam. If I had a nickel for every starry eyed idealist point to tagging saving the world, I'd be able to fund my own blog search engine...

In fact the founders here were 4/5's of the the core team behind the Open Directory Project which, at the end of the day, was an attempt to create a system to categorize web pages in a scalable way. The political system behind the editors at the Open Directory was a big part of whatever success it has had, and the lack of a moderated system the reason that many similar efforts have not gotten any major traction.

After talking to Ofer Ben Shachar on my webradio show about his company, Raw Sugar, I had some other thoughts around tagging. The big takeawy I got from talking to Ofer was that he saw a huge opportunity in providing value added search around the tagging done by individuals on their own data -- Load in your bookmarks,, Flickr tags and whatever -- get better search results. And, next, if you can put some sort of ordering in, from what people have explicitly have ordered within their own tagging -- you'll have built something of value.

Now, there's a lot of stuff those guys are going to add to their site at Raw Sugar (At least being able to explicitly tag who your friends are within their service hopefully), and I'm not sure if they've cracked the code here -- but I'm recalling the power of gathering unstructured data when I first started using Ryze (one of the orginal social software services), where you could put in anything and have it "group" with other people who entrered the same thing...they would also put some lightweight directory around these entries (education, place of work, etc) and this worked rather well to create ad hoc communities.

So -- on one side, you're looking at some pretty powerful mojo in enabling people to self categorize at least their own data and then leveraging that effort, as well as putting some structure around it. On the other, you are going to have some major problems unless you mopderate or put a reputation system in place (which Ofer mentions in passing, as well).

I'm usually a skeptic about leveraging communities (having run the ODP community for a couple of years, it's a lot harder than you might think), but at least people are beginning to think about some of the problems. Ofer's a fun guy to talk to as well. The interview is on the site.