September 9, 2004

Perfect Search

by skrenta at 12:07 AM

John Battelle asked me to speculate about Perfect Search.
Imagine the ability to ask any question and get not just an accurate answer, but your perfect answer -- an answer that suits the context and intent of your question, an answer that is informed by who you are and why you might be asking. The engine providing this answer is capable of incorporating all the world's knowledge to the task at hand...
Immediately a bunch of sci-fi imagery flooded my mind. I was going to contrast the moral lessons between Danny Dunn and the Homework Machine and Colossus: The Forbin Project. I pictured Beverly, alone on the bridge, talking to the Enterprise Star Trek computer, trying to reason out her predicament with her friendly savant assistant.

But hold on a sec. Everyone on the planet is walking around with a Star Trek communicator, but it doesn't seem like a big deal anymore. Is the Oracle of all human knowledge going to be any different?

We've got the entire world's libraries wired into a little device on everyone's desk. All anyone needs is the right library reference string to access any document. But that's not good enough! It's sometimes hard to track down documents in the world library. And then after you find them, you have have to read & study them, even reason based on them. Laziness spurs productivity, so we ask: Is this automatable work?

Part of the tech bargain seems to be that intelligent helper-humans get too expensive, but we get a big boost in personal productivity to make up for it. You can't delegate to your personal assistant (most can't afford one) but you have a collection of whiz-bang tools instead.

But the confidant, the sounding board, the Doctor Watson to Sherlock Holmes... we need a tool for that, since all of us modern Sherlocks can't afford Watson's anymore, and are sitting at home alone in front of our library terminals. The tool that we can think out loud to, and will echo back the right spur to the key connection.

Nah, that's all junk. Premature. No Hal yet.

The next rev is an interpretive semantic overlay on the web, with the ability to manipulate results during the search/processing event.

Direct Mail folks computerized way back in the 70's, to data-mine postal address databases for insights to who would buy Alaskan cruises, or steaks in the mail, or insurance. Silicon Valley marketing types might look down their nose at these old-school east coast database marketers, but those guys know SQL. They enjoy browsing relational databases looking for insights.

Say you run every sentence on the web through a part-of-speech tagger, isolate noun phrases, classify them (person? place? public company? drug?) and have a collection of intention-classifying patterns.

Blabble is doing part of that, and Topix.net is doing part of that. Whizbang Labs was going to do it on a big scale. (This is all old-hat to AI academics, but commercial applications are always more modest and incremental than what can be imagined.)

Now throw some kind of SQL like database language on top, so you can issue queries to find out the most popular baby names this year, from indexing birth announcements on the web and in blogs, or to be alerted about new developments at a competitor -- reading every job board everywhere, as well as monitoring every public communication from any employee, or individual closely associated with an employee, at the target firm. Blogs, org charts, resume databases, business card fishbowl scans, social networking/FOAF...it's all there, waiting to be parsed & mined.

Before we can have our intelligent-assistant DWIM mind-reading turing-complete search engine, we'll have something ghastly like SQL on top of a semantically-tagged representation of the web. 20 minutes of manual searching and reading will be turned into a few seconds of work. Hal can be built on top of those APIs, but skilled humans will use them first. It will be way cool to search with.