March 31, 2006

Topix.net forums on fire: the Ni-chan paradox

by at 2:28 AM

Back on December 12th, we released a site redesign that included user forums on each of our news pages. We were pretty psyched about this -- a chance to let our users “talk back” to the news seemed like an obvious next step in the evolution of the Topix.net site. And given the amount of general site traffic we receive, we figured commenting on the news had to be a big hit...

Well, not exactly:

One month after launch, we were still under 200 posts a day. Though we sparked some interesting posts, this was certainly not something to write home about. We had designed a system that was usable, but not accessible.

What could we change?

Since this was our first foray into on-line community, we wanted some control. One way we hoped to manage this was with registration. (In retrospect, it’s odd that we would think this way, given that we regularly evangelize to publishers that they banish this model in favor of access and usability).

Could we take the registration down? Of course the volume would go up, but what would happen to the quality of the posts? Our automated moderation queues were already averaging a post-kill rate of 4.5% -- nearly one in 20 posts. Was this going to double or triple the amount of spam and profanity we needed to parse through? Would an army of trolls invade and set up a siege? There was much hand-wringing on the eve of January 11th…

Let’s take it down and see what happens. We can always put it back up, right?

Since removing registration, our volume has exploded and just this morning we just passed a quarter-of-a-million aggregate posts on our system. And the quality of posts? To our surprise, our post kill-rate has actually dropped -- hovering below 2%. This is less than half of the number incurred when registration was in place.

What gives? We think it’s the "Ni-chan paradox"…

If you’ve never heard of Ni-chan (or "2ch" - pronounced "ni-channeru") it's a Japanese site that has the distinction of being the largest internet forum in the world. 2ch champions an "anything goes" approach to posting, and while it's a bit more wild-west than Topix aspires to be, we believe they're on to something by eschewing the user registration in their boards.

Here’s a great post explaining the 2ch rationale for jettisoning the reg, and a quick summary of the philosophy:

  • Registration keeps out good posters. People with lives will tend to ignore forums with a registration process.
  • Registration lets in bad posters. Children and Internet addicts tend to have free time to go register an account and check their e-mail for the confirmation message. They will generally make your forum a waste of bandwidth.
  • Registration attracts trolls. If someone is interested in destroying a forum, a registration process only adds to the excitement of a challenge. Trolls are not out to protect their own reputation. They seek to destroy other peoples' "reputation”.
  • Anonymity counters vanity. On a forum where registration is required, or even where people give themselves names, a clique is developed of the elite users, and posts deal as much with who you are as what you are posting. On an anonymous forum, if you can't tell who posts what, logic will overrule vanity.
  • Registration keeps out good posters and attracts trolls? Who'd have thought that? But look at the results from a random sampling of topics:

    NorthWest Airlines
    NASCAR
    Willow Springs, MO
    Yanni
    Celiac Disease
    Columbia, KY
    Saudi Arabia
    Charlie Sheen

    Heck, even Jeeves the the Ask.com butler has an outpouring from his faithful.

    From the data, it's fair to say that none of this dialog would have ever have taken place if we hadn't removed the reg.

    It's clear that our users are extremely passionate, and have an awful lot to say. And as the steady migration to on-line news continues, we owe it to them to take some risks and rethink some of our preconceptions about how best to serve them.

    March 10, 2006

    Topix Tags Photos

    by skrenta at 3:14 AM

    Thanks to a feed from the Knight Ridder/Tribune News Wire, Topix.net is now categorizing news photos based on their caption text into our 30,000 geographic and 300k+ subject feeds. We're hosting the full size photos, along with zippy thumbnail index pages.

    Check them out in sections like US, World, Entertainment, Sci/Tech, and Sports. Photos can be categorized into any Topix category, but the major sections will roll-up the contents of their sub-sections so they're good places to browse.

    Currently we have about 20,000 photos categorized onto our site, but we're pulling the live feed and categorizing new ones as soon as they come in so more and more will be added to our category channels over time.

    March 3, 2006

    Work at Topix, Build a Better Informed Society

    by skrenta at 4:01 PM

    Updated: 02-Aug-06

    In 1998, when Bob Truel and I wrote the prototype for dmoz.org, it took us 2 months to finish the first version of the code. We launched, signed up lots of volunteer editors, and 5 months later Netscape acquired our project and helped make the ODP the largest directory of the web. The prototype code we wrote that summer in 1998 is still running at AOL today.

    That's not a good thing. What kept dmoz on top, despite many attempts by competing projects to displace it, wasn't the depth or sophistication of its software system. It was basically a set of web forms on top of a simple hierarchical database. What kept dmoz on top was a network effect. It's the same effect that will keep eBay on top of its market for the rest of our lives.

    In 2002, when team dmoz left AOL to found Topix, we wanted to base the value of our next company, not on a network effect or a fad or gimmick, but on the depth of the intellectual property -- on the strength of the underlying software system. A software system where programmers could come to work every day, add improvements, do that for years and years, and still add value. Over time, the software system itself would provide the competitive differentiation for the business, and the barrier to entry for others.

    We found such a problem in crawling, geo-localization and subject categorization of the news. Named entity disambiguation. About-vs-mention discrimination. Heat and tone language detection. Zero-configuration scalable crawling. All in-memory index serving. On-the-fly category merging, clustering, de-duplication, and geo-spinning. Auto-tuning bias weights in our robo-editor.

    In 2006, many other companies have now recognized the significance of audience shift from offline to the net. VCs have funded $30M worth of Web 2.0 news startups (all going for a network effect). We find ourselves with a 2-year technology lead, an audience of 5M unique visitors, and investment from the top three newspaper companies in the US.

    We supply local news to Ask Jeeves, AOL, Citysearch and Earthlink. And we count the New York Times, BusinessWeek, the Washington Post, USA Today, and 177 other publications as partners.

    But we need one more thing to achieve our goals.

    You.

    The great challenge, as well as the greatest opportunity offered by the Internet is to fulfill its promise as the first mass two-way communications medium. The printing press and the antenna tower have given way to the net. Mass media is in the hands of the masses. Our goal is to build the social architecture of discovery and participation.

    We're here to build the #1 news site on the web. Seriously.

    We're looking for: (current as of 02-Aug-06)

    • Software engineers (Perl, C)
    • Network operations - Sr Dir/VP
    • Online community manager
    And, we wouldn't like you if you weren't skeptical. Check us out. Read our blog, see what we have to say, and check out what people have to say about us. We've got a great future. Come be part of it.

    jobs (at) topix dot net

     

    March 2, 2006

    Las Vegas – The New Publishing Capital?

    by at 6:29 PM

    I wrote a post a couple of weeks back walking through the on-site and off-site earnings of Google and Yahoo. The upshot was that if the Internet’s publishing leader, Yahoo!, with all of its technology, hardware and, most importantly, sales people, only eeks out $4 CPM (including its home page) for its non-search revenue, where does that leave the rest of the publishing industry?

    I started thinking more about this and asking myself the question: why do they even bother having the rest of the site? I mean if Yahoo sports can’t pay its own bills, why keep it? And more curiously, why is Google, the internet’s search leader, averaging $50 CPM on its site, adding more and more features that will move it from the high search CPM’s to the much lower publishing CPM’s? And that’s when I remembered Vegas.

    Las Vegas is an interesting place. Billions of dollars are spent every year on the latest and greatest hotels. Hotels that contain scaled versions of the Eiffel Tower, the Empire State Building, the canals of Venice, a wave pool that can host surfing contests – not to mention the rooms, the restaurants, the bars, the shopping, golf courses, etc. I am admittedly not a gaming industry expert, but my understanding is that all of these amenities are really viewed by the resorts as loss leaders. Maybe they make money on serving drinks by the pool, maybe they don’t. Maybe the room rates cover hotel operating costs, maybe they don’t. Doesn’t matter. The real money made in Vegas is the casino. Build a hotel that includes a suite with a bowling alley, a roller coaster for the kids and a buffet for everyone – as long as the guest shows up at the casino with a fistful of cash, the resort is in the black. They intentionally design every square inch of the hotel to ensure that I am more likely than not to drop some coins in the slots, play a hand or roll the dice. The décor, the layout, the location, etc. – all designed to optimize my likelihood of gambling a bit.

    So what does that have to do with the publishing industry? Well, when I look at Yahoo, I see Vegas. I sign up for free email, read stock quotes and sports scores there for free and maybe even play some fantasy baseball for free. They don’t seem to mind. Why, because as long as when I am there, I do a search or two, all the bills get paid. The layout on these pages may differ, but the one consistency is the ever-present search bar on the top.

    So is that the future of the publishing industry? Hiring the same consultants the gaming industry uses to optimize their hotel layouts for gaming to optimize web site pages for search? Having a news room dedicated to creating content on a daily basis which is really just a lure to get someone to do a search there?

    I sure hope not. And don’t believe it needs to be that way. But, again, absent a business model showing an ROI to advertisers that is equivalent to search, that’s where we’re headed.