February 2, 2004
Memory resident
by at 10:48 PM
A while ago I was chatting with my old boss Wade about a nifty algorithm I found for incremental search engines, which piggybacked queued writes onto reads that the front end requests were issuing anyway, to minimize excess disk head seeks. I thought it was pretty cool.
Wade smacked me on the head (gently) and asked why I was even thinking about disk anymore. Disk is dead; just put the whole thing in RAM and forget about it, he said.
Orkut is wicked fast; Friendster isn't. How do you reliably make a scalable web service wicked fast? Easy: the whole thing has to be in memory, and user requests must never wait for disk.
A disk head seek is about 9ms, and the human perceptual threshold for what seems "instant" is around 50ms. So if you have just one head seek per user request, you can support at most 5 hits/second on that server before users start to notice latency. If you have a typical filesystem with a little database on top, you may be up to 3+ seeks per hit already. Forget caching; caching helps the second user, and doesn't work on systems with a "long tail" of zillions of seldom-accessed queries, like search.
It doesn't help that a lot of the scheduling algorithms found in standard OS and database software were developed when memory was scarce, and so are stingy about their use of it.
The hugely scalable AIM service stores everything in memory across a distributed cluster, with the relational database stuck off to the side, relegated to making backups of what's live in memory. Another example is Google itself; the full index is stored in memory. Servers mmap their state when they boot; no disk is involved in user requests after everything has been paged in.
The biggest RAM database of all...
An overlooked feature that made Google really cool in the beginning was their snippets. This is the excerpt of text that shows a few sample sentences from each web page matching your search. Google's snippets show just the part of the web page that have your search terms in them; other search engines before always showed the same couple of sentences from the start of the web page, no matter what you had searched for.
Consider the insane cost to implement this simple feature. Google has to keep a copy of every web page on the Internet on their servers in order to show you the piece of the web page where your search terms hit. Everything is served from RAM, only booted from disk. And they have multiple separate search clusters at their co-locations. This means that Google is currently storing multiple copies of the entire web in RAM. My napkin is hard to read with all these zeroes on it, but that's a lot of memory. Talk about barrier to entry.
Recent Entries
- Chris' Interview with IdeaMensch
- New Politix Editor-in-Chief
- Introducing Politix
- Topix Forum Helps Educate Locals On Proposed School Merger
- Hyperlocal Sites Set To Play An Important Role in the 2012 Elections
- Research Shows Big Link Between Commenting Online & Voting
- Politicians Take Note: Women Are Attracted To News Aggregators
- Rules Of Engagement
- Survey Says...Hyperlocal Ads Sales Are On The Upswing
- "A Land Grab is on in in HyperLocal"
Archives
- August 2012
- July 2012
- May 2012
- December 2011
- November 2011
- October 2011
- July 2011
- February 2011
- December 2010
- October 2010
- September 2010
- May 2010
- March 2010
- December 2009
- November 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- December 2008
- November 2008
- October 2008
- September 2008
- July 2008
- June 2008
- April 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- November 2005
- October 2005
- September 2005
- August 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
Powered by Movable Type
Topix
About Topix
- About Us
- Advertise
- Contact Us
- FAQ (General)
- Feedback
- Jobs
- Press Room
- Privacy Policy
- Terms of Service
Blogroll
