Check out Obama In Real Time. It is a web page set up by the newest web search venture Collecta to demonstrate its capability to accumulate chunks of data associated with the given search query, simultaneously and up-to-date. There a load of quotes, discussions, or comments mentioning Obama are displayed as soon as they hit the web, whether they are uploaded on news web sites, blogs, Twitter, or other sources. Although it functions just the way I expected upon hearing the term “real-time search,” I am nonetheless impressed and pleased to view the page as the Collecta script summons one result after another onto the screen. There are other “real-time search” web sites such as OneRiot and TweetMeme, but neither offers as dynamic a view as Collecta’s sample page.
Update: I have overlooked another real-time search engine named Scoopler, which not only provides true real-time search results as Collecta does, but also grants a search service (though in beta yet) to any queries. Collecta currently has, by contrast, sample pages on only two popular topics (“Obama” and “swine flu”) so far.
The real-time search may not be the most significant goal right now, but even the big-daddy-of-search Google eyes on it as one of its unsolved problems. Although relevance and accuracy are usually higher regarded goals in search tools, recency is also an important objective, especially considering the nature of information on the web, constantly accelerating and expanding in scope. Even now automated scripts, or “search bots” scour through the web every few hours, if not minutes, collecting every bit of information as far as they can and storing them onto search indexes. The more recent the last search was conducted on a particular item, the less difference there would be between the search result recorded on the index and the actual target. One of the ultimate ambitions of a search engine, therefore, is to minimize the time difference between what is (or is not) on the cache and the actual objects on the Internet. The solution would be to fetch the results for the user, on real-time.
On the other hand, however, keeping search results up-to-date does not always equate to better search results. As TechCrunch points out, an undisputable majority of search results through real-time search queries return from Twitter posts; those 140-character pieces on the microblog are usually unlikely to include meaningful information on the intended subject. Return to Collecta’s real-time search results on Obama and you might see what could be problematic with it. Second after second, you may see a lot of Tweets churning out the door, which either posts links to news articles mentioning Obama or displays some random complaints on the man. The results are aligned chronologically, without any regards to relevance or whatsoever, as expected. Would a search tool like this, other than introducing us to the latest snapshots of what is posted online, contribute to further knowledge of what is on the web? So far, I am pretty much uncertain.
Also, even from the eyes of a computer layman, collecting search results on real-time would be much more resource-consuming than retrieving information on a periodical basis. Querying on a conventional search engine would only stress a one-time return from what has been indexed. The search result would be static, and unless the user decides to reload or retry the search, the job is over for the search engine. A real-time search, however, would require a constantly running process on the search engine, continuously exploring and loading information from the web until the user quits or shuts down the search entirely. Multiply this burden by millions, and it could be hypothesized that this may not be as scalable as the current search model. Correct me if I am thinking wrong, but if a real-time search yields no significant information return than a lagged search performed hourly or two but the former costs and needs many times the resources than the latter, I do not see a reason to favor the former, let alone embrace it as the next big thing, at all.