February 28, 2006

Topix.net @ SES NY, Year 3

by skrenta at 11:59 PM

My first SES show was for dmoz, back in 2000, before it got huge, and I love that show. The seedy mix of top-tier search engines, portals, evil SEO sharks, searchy schwag, and red-shirt throwaways from the DMA create a great vibe.

Mike and Chris took our new orange booth to the show [pic pending] and got to meet with some folks. I wish I could have been there.

SES NY: The Topix.net Approach To Web News
Mike McDonald cracked me up with the intro to his great write-up of our feature launches since December:

The first and most obvious change at Topix was a substantial redesign. I hesitate to say that their original design was ugly, but... now having hesitated appropriately, I'll say it; their old site was ugly. The new design is a marked improvement to put it mildly. As Chris put it, "we felt like we had a top quality news site, we just needed to look like one". Beauty is in the eye of the beholder and all, but I doubt he's getting too many emails from folks complaining that they miss the old design.
The last major Topix development since our last meeting was easily the coolest as far as I'm concerned. They launched a really slick forum system. It's not a forum as you may be used to per se insofar as it covers a huge number of subjects...
(Mike's right, our old design was ugly.)

Newspapers Should Move Faster Part II
Greg Sterling has been giving us insightful comments about the news industry since he tolerated our analyst launch PR briefing in 2004.

The community content (comments) that Topix has is gold -- wouldn't the newspapers love to have that kind of participation on their sites?

More SES NY coverage: Search Engine Watch, seoroundtable, Technorati.

February 27, 2006

Every word in every document is already a tag

by skrenta at 8:20 AM

Back when web directories were still cool, AOL had an effort to build their own based on the Dewey Decimal System. They had 60 contractors in Arizona typing in web urls and assigning DDC numbers to them.

This didn't work. But why?

Because two thoughtful, non-malicious humans sitting next to each other will tag the same URL differently. (And, in this particular case, the most obscure URLs would default to more prominent positions in the DDC hierarchy, because they couldn't be classified.)

When you pick up the result of this exercise by a particular DDC number to get that category page, it's junk. It's missing a lot of stuff it should have, and it has stuff it shouldn't.

Before we had full text search of the world's knowledge at our fingertips, search systems would let you retrieve documents by keywords. If the item you were looking for hadn't been given the right keywords, it was undiscoverabale. "Internet Law?" "Software Patents?" "IP Theft?" Modern search systems consider every word or phrase in the document a tag.

Chris posted a rant about tagging here previously. I go back and forth on them.

On one hand tags work because they maximize participation with a simple user ask and the social use effects help rough standardization emerge around them.

But tags aren't a panacea, since they're excessively vulnerable to spam, and the items which should belong to the same categories will get different tags from different users. Which is it, "topixnet"? or "topix"?

They're uniquely valuable in a system like Flickr since photos don't have any text of their own to keyword search, so getting the user to add any searchable text at all is a big win. You can ask users to caption their photos but often putting just a word or two is easier so the participation level is higher.

But if you have the full text of the web, or blogosphere, or whatever, the marginal utility of the "keywords" tag on the document seems to be rather low. To deal with spam and relevance issues, the search interface for a large collection needs to be appropriately skeptical about what documents are claiming to be about.

It's great if you can get the user to enter additional metadata about their posts. But if you aren't already looking at the existing text you're missing a lot of pre-existing "tags".

February 22, 2006

Guardian Copying Topix Forum Geolocation?

by skrenta at 1:10 PM

Assistant editor Neil Macintosh of the Guardian writes:

We are also thinking of revealing on the site every commenter's rough geographical location; information not exposed to the public before. Experiments on other sites suggest debates are more civil when everyone knows where everyone else is.

"Experiments on other sites". . um, well, no. We're flattered that the Guardian is considering copying Topix's user geolocation feature to enhance their forum civility, but Neil should had given Topix proper attribution, instead of suggesting this idea had come to them from multiple examples. It didn't.

Apart from the civility and accountability benefits of having what is essentially a non-forgeable approximate location signature on each post, however, is being able to see the vast geographic diversity of participation. We're getting heavy use across the US, as well as approx 30% of our posters being international in our world forums.

Wags have pointed out that the geolocation is not always accurate. That misses the forest for a few trees. The fact is that the the location is usually accurate; when it isn't, it's roughly accurate. And even when that's not true, it's generally stable. Yes, you can hop around between ISPs, get in your car and drive to a T-mobile hotspot...whatever. Dealing with 32,000 posts about the Danish cartoons is not about getting every single post right, it's about good 90% solutions that, on average, significantly uplevel the aggregate experience.

February 16, 2006

7 billion, $108 billion. 45 billion, $46 billion.

by at 6:02 PM

No, these aren't the latest bank statements from early Google investors, rather they are Google’s and Yahoo’s monthly page view and market cap statistics. The 7 billion and the 108 billion belong to the big G, and the 45 billion and 46 billion belong to the big Y.

Why do I point these out? Well, to be honest, as a publisher, they kind of jumped out at me. In the month of December, Google, according to comScore Media Metrix, served up 7 billion page views and, as of the date of this post, had a market value of $102 billion. Yahoo, on the other hand, served up a whopping 45 billion page views for the month and was valued at a mere $46 billion.


So I dug further. Google and Yahoo earned $1.098 billion and $1.007 billion, respectively, in on-site revenue this past fourth quarter. In other words, for all their hard work of attracting an audience to their site that is literally almost *seven* times that of their competitor, Yahoo brings in $91 million *less* per quarter.


A couple of more swags:

According to Nielson Net Ratings, Google owns about 46% of the search market, Yahoo 23%. So let's do some guess work here...since Google is really only a search site (ok, not really, but for these purposes it's a safe assumption), let's assume that the 7 billion monthly page views mentioned above represented 100% search page views. So, 21 billion search page views for the quarter (7 billion per month for 3 months) earned Google $1.1 billion.

Now, since Google has twice the search market that Yahoo has, that would mean that Yahoo has 10.5 billion on site search page views per quarter. Presuming that Yahoo and Google monetize searches at the same rate, those 10.5 billion searches would bring in $500 million or so. So far, so good.

What that also means is that for the non-search page views for the quarter, all *124 billion* of them, Yahoo earned $500 million or so. Put simply, search pages earn $50 CPM while non-search pages for Yahoo earn an average of about $4 CPM site wide.


So I guess we’re clear now on why Wall Street values the online search leader at twice that of the online publishing leader.

So where does that leave the rest of us in the publishing world? Is racking up 45 billion page views a month the only way an online publisher can build a big business on the web? What does that say for Web 2.0 and the long tail? Can folks who serve up page views in a de-centralized manner even cover their costs absent a compelling search offering? Will the millions of bloggers who are hoping to strike it rich publishing their daily musings actually earn enough money to pay the electric bills their laptops generate?

Well, maybe. Probably, not right now though. It is going to take continued developments in both technology and business models to successfully morph the current advertiser experience delivered by the publishing world (access to eyeballs) to that delivered by the search world (access to actual leads). Obviously, performance advertising was a great first step in this direction.

Here at Topix.net we think we’ve taken what the next step by using our own technology to further contextualize performance ads. To be certain though, we’re not at search levels of performance. The folks at Google (Adsense) and Yahoo (Yahoo Publisher Network) likewise also making steps in the right direction, as are folks like Kanoodle, Quigo, etc. But as far as I am concerned, the field is wide open. In any event, this type of problem is true opportunity - whoever cracks this code will be sitting on top of a huge market.

The industry wags like to talk a lot about empowering the people, citizen journalism, new media, blogging, the future of publishing, blah, blah, blah…and that’s all well and good. I guess my point is let's not forget that we still have lots of work to do in figuring out how to pay the bills.

February 15, 2006

Topix.net Conversation Map

by skrenta at 12:25 AM

We're getting thousands of local posts on our new geographical forums on Topix.net. Talk-back-to-the-news and local online community sound wonderful but is hard to visualize. You can punch random ZIP codes into our site and see the message threads in random towns, but it's hard to get a complete picture that way.

So we plotted the local Topix discussions on a map of the US (actually rendered directly from our TIGER/Line article geo-categorization KB) and it looked pretty cool. You can click on a dot and see the message threads within a 60 mile radius of that spot. We added color/size variation to indicate the volume of posts in a locality, and how recent they are.

So far this is just for the US, we will have to do one for the non-US forums on our site next.

February 9, 2006

What do you do with your online community when things get hot?

by skrenta at 7:05 AM

The Washington Post recently closed down a message forum after getting 700 heated posts in response to a story about the Abramoff scandal. Last June, the LA Times' short-lived Wikitorial experiment shut after quickly succumbing to vandalism.

Two months ago we launched a community participation system on Topix. In the past week we've received over 14,000 comments posted to our Denmark forums. There is a lot of heat in these forums. Lots of strong language, and many offensive posts. However there are also many genuine conversations occurring.

Should the response to fighting breaking out be to shut down a media system where it is being discussed?

We don't shut down the newspapers, TV stations and radio every time public scandal or social unrest break out. If mass media is shifting to being in the hands of the masses, should we shut down mass discussion systems when public issues boil over? Isn't that when we need open discussion and media the most?

A few days ago my CFO asked "where do you think all of these people are really posting from?" I said, gosh, I don't know, but I know how to find out. So we started cutting and pasting posting IP addresses into a geoip locator. Holy smokes.

Riyadh, Saudi Arabia.
Hørsholm, Denmark.
San Antonio, Texas.
Québec, Canada.
Dubai, UAE.
Los Angeles, CA.
Brisbane, Australia.
Beirut, Lebanon.
Amman, Jordan.
Kuwait. Cairo. Turkey. The UK
Posters from Tehran and Riyadh were responding to comments from the UK, Los Angeles, Denmark. We were stunned at the geographical breadth of participation.

We immediately subscribed to a commercial geolocator service and, in a fit of weekend coding, made this information visible to our forum participants. Topix doesn't show poster IP addresses, but now displays our best guess at your city/state/country.

The social architecture of a discussion system can play a huge role in the quality of the discourse. Since adding the user's location to each post, we've noticed a marked lift in the overall tone of the conversations. To be sure, there is still a lot of heat, but it seems like naming the town that someone is posting from has helped humanize some threads. It's not just a flamewar with faceless forum handles, there's a real person on the other end of the keyboard, they actually live somewhere.

Interestingly, posters on the whole seem to be less sensitive to trolls and other "bad" posts than we at Topix are by temperament. It's natural for us, as moderators of the system, to think oh no, a profane or hate-filled post could be on our site for a while -- maybe even hours -- before we can get to it and moderate it. The alternative, to moderate everything before it goes live, just isn't an option. It introduces too much latency and kills the conversation. The volume of discussion we're hosting is already beyond what could be properly covered 24/7, and is growing.

I'm also not sure it's healthy or appropriate to have a censor in Sunnyvale approving everything that participants in Tehran and The Hague want to say to each other.


Steve Outing: When Discussions Go Wild (Poynter)

Mark Glaser: Topix.net Forums Give Window on Cartoon Flap (PBS)

February 5, 2006

The Architecture and Ambition of Craigslist

by skrenta at 12:04 PM

Craig has been in the news a lot. The SF Guardian blames Craig for the decline of newspapers. Craig$list.com in the SF Weekly delves more into Craig's day-to-day routine. And New York Metro's piece by Philip Weiss goes out of its way to paint Craig a nerd.

Pet peeve: journalists that pretend to be your buddy for two weeks, then stab you in the back once the story comes out. Call me naive, but I hate it when deceit and betrayal are routinely used by writers as a professional tool.

Give Craig a break. He's a nice guy, and he's built a big useful thing that everybody likes. So people would rather post free classifieds on the Internet than pay for ones on newsprint. Get over it.

But the the real problem with these sour-grapes articles is that they don't shed any real light on why Craigslist has succeeded, where so many other similar efforts have not. Over-analyzing Craig's personal habits makes for catty reading but isn't going to help us understand his takeoff curve in new markets.

To understand how and why something works, study the thing itself, not the maker. Like the Drudge Report, this is a site that looks like it has a simple design, but there's actually a lot going on. The apparently-simplistic layout is like a stealth coating that keeps competitors from paying attention until it's too late. There was another company that used this trick, not so long ago...

I'm not going to dissect every last little piece of this thing, it would be too long and boring and you can study it yourself if you really need to for your day job. But let me point out a few things.

The hardest part of starting a community or usergen site is booting up the activity. Community is a network effect -- posters only come if there are readers, and readers only come if there are posters. So you have to get the chicken and egg stuff going to start up the motor and grow.

UI Selector of Doom

Craigslist is brilliant because his main activity is something that posters are inherently promiscuous with -- personal spamming. In any other context, the bulk of the material on Craigslist would be considered spam. In my email box, on another message forum, heck even on one of google's spam-ridden Blogger sites. The posts are the equivalent of those indiscriminately posted flyers on corkboards at universities.

Buy my mattress..need a ride to Chicago...come see my band. People put these flyers up fully expecting only a handful to see or care about them enough to rip off a tab with the phone number at the bottom. The expectation of response is low but it's cheap to try.

Now Craig's lead-into-gold trick is that he gets his posters to accurately classify their spam. Into 160 categories. Holy Toledo Jacob Nielsen. You can't have a pulldown with 160 things in it. Half of your users wouldn't get a pulldown with 3 things in it right. Ah, but it's not a pull-down. Half of the entire homepage is a giant selector devoted to classifying posts.

Booting up in new places

Booting up new cities should be very hard, maybe taking years like the main SF site took. But there's another set of seed material to help new Craigslist cities get going. The discussion forums. These are global across all the Craigslist cities. If you go to perth.craigslist.com and click on 'transit', you're going to read about SF Muni. But fortunately many of the categories, like 'kink', travel well. So there is plenty of discussion on a brand new Craigslist city to look at even when nobody from the new town has contributed anything yet.

Sex in your City

The personals column competes with Match.com, eHarmony, and other dating sites. But it's got something they don't. A riveting editorial column written by the users.

"Rants and Raves" and "Missed Connections" contain wistful love letters and lurid first person accounts of dating horror. It's great reading and the newspaper profiles of Craig always throw a few of these in to juice up their stories. Even if you're not looking for a date that stuff is great to read. It provides a key draw and lends a personal voice to the Craigslist dating product.

Compare Craigslist's personals/dating section to Match.com. Match.com's editorial product looks like some kind of computer matchfinding machine with a scary mass-media emblem, complete with tm on the logo. Super corporate.

Yahoo Personals doesn't have an editorial component at all, just the web database search form to narrow your search for a mate, like the one you use to find a used car at Cars.com (although, to be fair to our friends at Cars.com, they actually have some great editorial tools around their car search form).

Yahoo has a single link to "Personals" on their homepage. Craigslist devotes 9 links in something like a 160x300 rectangle to personals, using their link selector trick to avoid the grim who-are-you-and-what-do-you-want pulldown, and to visually promote what's inside the dating section, and to showcase their two fun-to-read columns. Craigslist has a killer dating product.

Of course they do this in a bunch of verticals...

Craig comes for you...

As I was putting this post together I asked myself -- who else attacks so many different businesses on a single hompage? Online dating, events, real estate, apartments, forums, used cars, community, jobs. OMG... Yahoo.

Google took on Yahoo by radically cleaning up their homepage. Just a single search box. This was innovative and it worked. Nobody took it seriously at the time either. No ads on the homepage? How are they going to make money?

Craigslist is the UI polar opposite of Google. If you were to make the click-browse vs search-box extreme that is not-Google it would be Craigslist. Just a home page stuffed with a sea of flat tag links. Craigslist is more Yahoo than Yahoo. It focuses on just what's important, without distractions and legacy constraints and compromises.

Decimating newspaper classified advertising will just be footnote along this march. Look at the rest of the verticals that Craigslist so effectively covers.

A cuddly, socially-responsible PR story helped Google's founders avoid suspicion of being merely clever, pragmatic capitalists that wanted to take over a bunch of markets with a monstrously successful business. Craig Newmark is wrapped in the same flag.

Another "I'm just a nice social liberal story" combined with a devastatingly effective UI that nobody gets -- watch out. :-)


Naval: Craigslist is worth more than eBay

Also: Ben Barren   John Battelle   Andrew Goodman