October 4, 2004

Topix.net Architecture

by skrenta at 11:59 PM

Industry canon seems to favor RDBMS storage for everything. Logs, centralized user reg, even whole freakin' web crawls. Maybe this is a legacy from Oracle's phenomenal marketing machine that rolled over the valley in the 90's, which eventually (through skillful FUD to VCs I was told) required even little startups to pay enormous sums for databases licenses. Now that great databases like mysql are free, coders are enjoying the luxury of SQL without the heavy price tag.

But I'm not much of a database guy. The lesson I took away from watching the horror of Netscape's UREG database being down for two weeks after a RAID enclosure failure was that even fancy databases, expensive hardware and knowledgeable staff weren't a substitute for fail-safe KISS architecture. Folks at a small startup I knew that was acquired by Excite were expecting to finally get some help debugging their sick, monster DB; they were horrified to find Excite's internal systems in even worse shape than their own. A shopping engine I once knew had a nightmarish flow of chained databases, with a slow-boat-to-China 24 hour dataflow through the system end-to-end. The mess was so big, complex and expensive it was impossible to replicate in QA, meaning that testing occurred on the production system (with predictable results).

Studying Unix internals at USL left me with a desire to always optimize apps down to the syscall level (how many bytes could safely be appended atomically to a file per write in SYSV, hmmm). I like flat files, with operations set up so access is fast & failsafe.

Live servers must never wait for disk to serve a page, should try to avoid talking to sockets, and the only safe storage operations are write-with-append and rename(). Never use NFS for anything, mmap is your friend (thank you Google for helping get the kernel bugs out), and design your system so that you can cycle power with zero corruption. Locking is a last resort; locks wreck performance, and are a waste if your app has last-writer-wins semantics anyway or you can use append & rename().

In other words, Hotmail's architecture rather than eBay's. You can bet that Matt Wells is using some creative data structures in Gigablast.

A side effect of this approach, aside from reliability, is that systems built this way tend to be vastly more scalable.

Woz talked at Gnomedex about how being cheap made his designs better. Cheap leads to less parts, which means higher reliability. We use serial ATA and IDE raid. We used to have 3Ware cards driving the RAID, but the 3Ware card turned out to be a point of failure, so we just use the straight Linux RAID software now. Very cheap, very high performance.

Even better is to get rid of the need for RAID at all. If you have to replicate CPUs for high availability anyway, toss out the RAID on each and figure out a live mirroring system. You'll lose redundant spindles, and make the whole system cheaper and more reliable.

Everything on Topix is served from big mmap files made up of compressed data chunks. This supports thousands of hits per second per machine, is infinitely scalable, and we can update all 150,000 site pages every few minutes simply by pushing new wad files to the front ends.

Our search backend isn't quite as cool, involving some legacy code, but it's getting there.

The largest factor in architecture, however, isn't how many machines you need, it's how productive the coders can be extending the system. Fortunately dead-simple architecture tends to be highly productive to code on. With less overall moving parts in the system, there is less mystery, a faster learning curve for new folks, fewer places for bugs and unexpected states to hide, and less lines of code that have to be maintained for a given component.

I've come across other flat-file and KISS adherents, but they're rare. I was told once that the VCs made Filo and Yang buy the Oracle licenses, but they left the software on the shelf, preferring instead to deploy simpler systems built from scratch on BSD. Those guys were smart. :-)

Also check out this amusing article from Smart Money in 2000:

"Google actually built its own database from scratch, and it's a wholly different type of software, called a 'flat file' database, according to Craig Silverstein, Google's director of technology."
Those guys are smart too. :-)