January 31, 2004

Most brilliant product launch ever...?

by skrenta at 10:37 AM

...or conspiracy theory?

There's a fascinating series on Scott Rafer's weblog about Orkut. The PR story is that a single programmer build this in his free time. Yet:

  • Orkut is the most feature-advanced social software product to deploy to date.
  • Orkut has scaled massively in its first few weeks of operation. This requires a scalable server infrastructure and 10's or 100's of machines.
  • The "Under Construction" page was signed "The Orkut team".
  • Orkut's internal code name was reportedly "Project Eden" and "the Friendster killer"...
  • Orkut already has a mail system built in. Google Mail?
  • Jeremy Zawodny thinks that that Google needs Orkut to get registered users with tons of personal information about them.
We have no way to know which of these speculations are true, but Okut does smell like a deliberate organizational effort, with careful PR/messaging, scalable infrastructure, slick design, and strategically optimized for Google's product roadmap.

Check out this Alexa graph comparing Orkut's traffic growth to Friendster's. Friendster is flat and Orkut has nearly caught up after just 2 weeks.

ObTopix: Track the fun with our news scan pages for Google and social software.

January 28, 2004

Free Headline Syndication

by skrenta at 9:01 PM

We've put up a little widget that anyone can use to add headlines to their website. Just find the topix.net page that you want to pull headlines from, and look in the lower left of our page. We have a link which goes to an instruction page showing the html you can use to paste into your site. (Feel free to hack the url to change the colors, alter the topix page the feed comes from, etc.)

If you'd rather have more control over the display than our widget allows, it's okay to pull our RSS and surface the headlines on your site, so long as there is an attribution and linkback pointing to the topix.net page that you're pulling from.

January 21, 2004

Topix.net adds RSS support

by skrenta at 11:12 PM

We added RSS support to Topix this morning. There was a bit of internal debate before turning it on -- some of the folks in our office didn't exactly see why we should put out feeds for a platform that only a handful of people are using. But I made the case for it (what was it again?) and put our little orange badge of XML pride up on every news page on topix.net.

Topix.net is, as far as I know, now the largest publisher of non-blog RSS feeds on the net. We have a feed for every ZIP code in the US, a feed for every public company, a feed for every sports team, a feed for every movie star, band and musician...and more.

I told Dave Winer about it ... but he flamed me. He said I had used the wrong version of RSS, and would never henceforth utter the name 'Topix'. Sigh.
(so much for perl -MCPAN -e 'install XML::RSS')

A couple of lives ago, I was engaged in my own protocol war. It seemed important at the time, but looking back I marvel at the effort we put into arguing over the alignment of bits in a class of protocols that, nearly a decade later, almost no one on the Internet has adopted. At least we always argued in a different city.

If there is user demand for us to add stuff to our feed that XML::RSS isn't supplying, we'll do it. Rest assured, however, that it will be assigned the proper engineering priority alongside the rest of the stuff we want to build.

In the meantime, if RSS mavens reading this could check out our feeds and let me know if they are mostly working, I'd be grateful. I tested them in a couple of RSS readers that I downloaded, and they seemed to function, but I'm no RSS guru (obviously). :-)

January 20, 2004

Random thoughts while sitting in the dark

by at 9:36 PM

  1. the power is off in my building, it's a beautiful night, the entire city is fully lit except for the one block that I live on... how can that be??

  2. is looking at web traffic stats as hypnotic to everyone as it is to me? for some reason I can sit and pour over them for longer than I would care to admit . I love seeing what pages are popular, what is being searched, etc. just a reminder folks, the power is out in my building - not a lot of entertainment options here. maybe that's why its so fascinating...

  3. what would I be doing right now if my laptop didn't have a battery? no tv, no lights to read by, not tired enough to fall asleep, girlfriend is out with friends, friends and family all tired of yapping on the phone with me....I literally would be bored stiff. thank god my laptop has a fully charged battery.

  4. should I offer a free ad to anyone who can correctly guess the winner and the total combined score of the super bowl? sure, why not? one 125x120 ad on the page of the winner's choice for one month goes to whoever can guess the game and score right.

    A couple of catches - ad has to conform with our standard ad guidelines. only available on a page where we show ads (ex. we dont have ads on the home page) and where ad space is available. only 3 winners possible - so if 5 people guess it right, the three that make it to my email first get the ad. if no one guess it right, the closest three win (again, entries I receive first win in case of ties). send the picks to [email protected]. only one pick guess per person. I'll email the winners and point folks to the pages with the ads on this blog.

  5. why did it take PG&E an hour to get to my place? they're outside my window now, so hopes are high that I will soon return to the 21st century.

  6. has anyone gone to our sports pages recently? they really look a lot better since we cleaned up some of the AI. stories are good and very accurate. <s>

  7. I've really been surprised by the positive reaction we've received from others in the online news community. everyone wants to make sure we are crawling them... I guess in retrospect this shouldn't be surprising since they want traffic, I guess I just didn't think we would have so many requests.

  8. is it plainly obvious that this is my first ever blog post? Other than my posts on the political boards of course, announcing our 2004 presidential election page. <s>

    - mike

Categorizing Blogs?

by skrenta at 8:17 AM

Topix.net currently includes a handful of blogs that we've editorially selected as being newsy enough to fit with the other material we have. This includes stuff like Dan Gillmor's eJournal, Lawrence Lessig, TechDirt, Executive Summary, and so on.

Originally we thought we'd have the full blogiverse in the categorization engine. We actually had it running internally for a while. But not much was coming out of the blogs, and the stuff that did didn't look very good compared to the other stories.

Our categorizer likes references to very specific named entities -- at the local level, streets, jails, hospitals, parks, rivers, and so on. For national news we're scanning for politician names; for world news, references to political leaders and geographical features. But the AI wasn't finding much to grab onto in the blogs we crawled.

I guess this is because a lot of blog material is informal discussion that is often more follow-up to news posted elsewhere than direct reporting. Blog posts are the feedback to the story, not the story itself. And the chatty tone often omits the who-what-where-when-why of a news story or press release that makes them self-contained entities.

Perhaps blogs need a system like Dave Winer's categorization scheme, although, despite my ODP background, I'm skeptical of ad-hoc user generated taxonomies. Or maybe the rumors that a new system like Kinja will make sense of it all will pan out.

Feedster is cool

by skrenta at 6:52 AM

I didn't quite get Feedster before launching topix.net and this blog. Why would I want to search blogs? But it's become an indispensable tool to me since our launch to keep track of public feedback about our system. I've begun using it to research what the online community is saying about other topics as well.

Searching the blogspace has replaced the role Usenet search used to fill. When most of the chatty online audience was on Usenet, a search tool such as DejaNews was needed to find out what people were saying about a subject. But online users who post have mostly left Usenet and moved onto blogs.

Regular search engines aren't very well suited to scanning for blog stuff, with their monthly crawls, spotty daily index coverage, and spam web results mixed in with the blogs. Instead, you need a crawl that's as up-to-date as possible (daily at least, if not hourly or better), and one that pays attention to scrubbing the crud out (akin to running the essential cleanfeed on a Usenet feed).

Plus I like the name Feedster. :-)

Should we add blog search to Topix.net? We include some blog results in our crawl, but nowhere near the number the dedicated blog search engines do. I'm undecided where inclusion of blogs in the mix of material we search and categorize should land on our to-do list. Getting all of the online police blotters we can reach onto our local pages is a higher priority for me at the moment...

January 18, 2004

Fast crawls by search engines scary but fun

by skrenta at 7:04 PM

Over the past week we've had many search engines and spiders visit topix.net. Generally spiders will rate-limit themselves to visiting a particular domain no more than once every 30 seconds. However, for a large site like Yahoo, Geocities or dmoz, this means that it could take half a year to finish indexing the whole site.

But search engines want to have the freshest data, and webmasters want to be indexed as quickly as possible, so a few advanced crawlers will detect if they are visiting a very large site, and speed up dramatically if they sense that the site can handle the traffic.

We observed this first hand the second day after our launch. Googlebot was the first to show up, and quickly accelerated to about 1 hit/second. Teoma arrived and spent half a day fetching 30,000 or so pages. But then AltaVista's spider Scooter arrived and really fetched up a storm. They were fetching well over 5 pages/second at the peak. I thought for a minute it was DOS attack until I saw that it was just AltaVista indexing us. :-)

Fortunately we've built a wicked-cool page serving infrastructure, so our servers didn't even break a sweat. Load on one peaked at 1.14 with 75% cpu idle. Not bad for a pair of Supermicro 1U Linux boxes. We haven't even added the planned third front-end server to the cluster yet. At this rate we may not need to for a while and can hold it back as a hot spare in the rack.

Topix.net launch

by skrenta at 12:17 AM

We launched Topix.net into public beta this week. I've been amazed at how quickly bloggers found out about the site and picked it up, thanks to linking, trackback, and popularity-tracking services such as Popdex, Blogdex, Technorati and Daypop. We had more traffic in our first three days than newhoo.com had in its first three weeks. Special thanks to Mike Masnick of TechDirt for being first to break the news of our launch, which got the ball rolling.

Of course there are more people on the net now than in 1998 when we launched NewHoo. But it was harder back then to get the word out about a new site. There was no single place to post an announcement -- you had to cover Usenet, mailing lists, various websites, get on the old Netscape "What's Cool" page (remember that?)

But the "blogosphere" now serves as a global peer-reviewed What's New/What's Cool system. Collectively, the blogosphere sees just about everything that happens, and sufficiently interesting, controversial, or popular material gets voted up to a wider view.

Apart from the effect blogging may have on journalism, I think it will have an equally major impact on PR.

January 17, 2004

Hello, World.

by skrenta at 11:50 PM

HELLO ASC 'Hello, World.'
DFB $0