November 6, 2005
Topix Tags Blogs
by at 1:56 PM
Today we added 15,000 top weblogs to the Topix.net crawling/tagging engine. Blog posts are being categorized into our 30,000 local feeds as well as our 300,000 subject feeds. Our search results now include blog results, and posts should show up on our site and search index within 1-3 minutes of being crawled.News vs. Blogs
There's been a lot of talk about whether bloggers are journalists. At Topix, we can ask a slightly different question -- Are blog posts news?
Others are doing a great job of providing relevant keyword search against blogs. But our mission is to discover the news within the sea of blog posts, and report them by location and subject. What we're releasing today is our first step in connecting our readers to 15,000 more voices talking about the topics they care about.
While Memeorandum and Digg are approaching the same problem, we needed a solution that would scale to our 300,000 newsfeeds.
Coverage: MSM vs. Blogs
We were curious how the breakdown of posts by topic from blogs would differ from mainstream media, and were blown away by the contrasts:
Adding these 15,000 voices to the conversation is a big win.
A note about the number of real blogs out there... There've been reports that there are 20 million weblogs, even one report said there were over 100 million. This is one of those cases where statistics can be very misleading. While the total number of unique feeds that have ever existed, or blogging accounts that have ever been signed up can certainly be counted, what is far more relevant to us is the composition of the daily posting stream. What we're seeing is that 85-90% of the daily posts hitting ping services such as weblogs.com are spam (take a look for yourself). Of well-ranked non-spam blogs that we've discovered, we've found about half haven't been updated in the past 60 days. Our filters sift through what's left, which even after discarding 95%, is still a great deal of good material.
Inside the Box
How did we judge which blogs to add? We started by crawling about 1M blogs, and then began automatically filtering and ranking these using our NewsRank algorithms -- which consider a variety of factors, such as blog posting frequency, writing style, type of reference, popularity, and so forth. We ended up adding the top 15,000 sources that passed these tests.
The graphs above reference postings from these top 15,000 blog sources, and our 12,000 main stream media sources. Taking them together, we think this is the first time anyone has ever summarized the subject matter for that conversation everyone keeps talking about.
Stopping at the top 15,000 was an arbitrary cutoff for this first release. Frankly this started as an internal experiment; we had no idea how well our engine would work on a large volume of blog material, but the quality of the posts we saw was so great that we decided to just launch the blogs today over the objections of our marketing staff. :-) We will continually add more sources and our goal is to push toward automated coverage of 1M sources.
Some topix channels where bloggers really add to the experience:
- Our page covering the new tagged news space
- Search Engine News channel
- Knitting News
- Patent law news
- Photography
- Apple Computer
- Ipod news
- Macromedia
- Linux
- Fast Food
- Chocolate
- Digital Cameras
- Poker
- Geneology
- Electronic (music)
- Reggae
- NY Knicks
- Cryptography
- Anthropology
- Sony PSP
Blogs and news are now on equal footing on Topix.net. We're visually highlighting blog posts on our pages for the moment so you can tell the new material from the main stream media posts, but consider the the current display a beta, in all likelihood this won't be the final UI. But we'd love to know what you think...
If we're not crawling and indexing your blog, we'd love to know about it. Please use this form to submit your feed for inclusion in our index.
Recent Entries
- Stay Connected with the Topix Toolbar
- Sparta Tennessee Cop Investigated for Fake Subpoena
- Topix launches local news iPhone app
- Thumbs Up for Topix at SXSW
- "Daily Telegraph" Modern as Namesake
- Comments give public a rolled up newspaper to smack union
- Twiistup: Conversation with Jason Calacanis and Chris Tolles
- News Flash: The Bad Guys Win
- Best Google OS Analysis? The Real Dan Lyon's Fake Steve Jobs
- Keep Cutting: Online Hard Going for Newspapers
Archives
- November 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- December 2008
- November 2008
- October 2008
- September 2008
- July 2008
- June 2008
- April 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- November 2005
- October 2005
- September 2005
- August 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
Powered by Movable Type
Topix
About Topix
- About Us
- Advertise
- Contact Us
- FAQ (General)
- Feedback
- Jobs
- Press Room
- Privacy Policy
- Terms of Service
Blogroll



