Search Engines and Portals

The Web is described as a "dense but disorganized jungle of information" in which individuals rely on search engines and directory services to locate the information they seek. This chapter looks into the evolution of the "portal" industry.

The Web as an Electronic Publishing System

The Web originated as a publishing system (scientific papers, academic research) - it was not the first, as a database is basically a publishing system, and the first databases that were designed specifically as a text storage and retrieval system had been developed three decades earlier.

The Web, however, has a fundamentally different approach: there is not a singular data repository in which documents are stored and indexed. Moreover, there was no formatting standards: even though there was a common language for formatting content (HTML), there were no conventions for structuring content, such as exist in traditional document management systems. Those with a traditional (IT) perspective on document management systems considered the "maddening randomness" of the internet to be exasperating.

Even in the early years, the volume of content being shoveled onto the Internet was considerable, and each year there was (and remains) a vast amount of legacy content that is not under any central ownership. In the battle between those who wish to access the content, and those who actually own and maintain it (and would bear the cost of the conversion effort), the latter have greater power - while the former are multiple and random.

Also, the Internet went commercial at a young age, which led to a power-struggle between those who wished to find information and those who wished to force their information on users who weren't seeking it. The bulk of the funding for early development of search technology was funded by the commercial sector, whose interests were decidedly different than academic ones, and the two struggled for "control."

While the resources (specifically, money) were stacked on the commercial side, the audience of users favored the academic side - the ability to find information, with minimal distractions from others who wanted to interfere for their own interests.

Directories vs. Spiders

In the early days of the internet, there were multiple efforts to manually catalog content - the most notable of which was Yahoo, which expanded from an "amateur directory" compiled by a couple of engineering students. As it was one of the first, and the largest of its generation, it got a lot of free publicity and rapidly grew an audience (which translated to advertising dollars).

Another approach to cataloging the Web, which became popular as the scope of the Web made manual cataloging expensive (and insufficient) was the use of computer programs to comb the Web and compile descriptions of Web pages by artificial logic. This approach had the benefit of scale (it was able to quickly categorize many more sites and individual Web pages than a staff of human surfers) but the weakness of inaccuracy (based on a simple algorithm, they often fail to recognize the actual meaning of a page).

Fundamentally, the user is faced with a choice: to use a manually-compiled directory (to get better accuracy but fewer results) or a spider-compiled directory (more results, less accuracy).

Web Search

Regardless of the method by which a directory is compiled, the user interface is similar: a search of the database for matching Web pages. The problem is in the matching logic. In the early days of the Web, there were too few results, such that you could not find a source that contains the exact information you need. Later, there were far too many, such that you still could not find a source that contains the exact information you need - not because it didn't' exist, but because it was buried in a clutter of "results."

There is a discussion of some of the various methods attempted to produce a search engine that actually gives a user what they want, rather than one that merely narrows down millions of choices to thousands, leaving the user to manually search among the search results returned. To date, the problem still has not been solved.

Web Advertising

Hated through it may be, advertising provided the funds to build out the Web directories. Were it not for advertising, they would remain amateur efforts, small and woefully inadequate to the task of cataloging an ever-expanding universe of information.

From the advertisers' side, the model is straightforward: once you build a site, you must attract an audience to it. If you wait for people to find it on their own, or rely upon customers to identify you as a vendor, there will be significant loss in the amount of capital spent to maintain the site until it draws sufficient audience to cover its costs. As in the real world, you must advertise to attract customers.

From the publisher's side, there is no convenient way to monetize content. Fro Web publishing operations. Not only did the Web lack the technology to charge a fee from users for accessing Web pages, but users simply wouldn't stand for it. There were a plethora of sites offering free information, and users saw no reason to pay for it, even from a more reliable source (a lesson learned the hard way by Encyclopedia Britannica).

With few exceptions (porn), users would not pay for accessing content. And so, drawing a large audience would cost a publisher more money (in hosting costs), but would grant them no income. The only solution to support a site, or turn a profit, was advertising. This followed the model of television, which provided content that was "free" for users to access, with revenue coming from advertisers.

In so many words, the author speaks of the death of online advertising for the small publisher. Since the medium is measurable, advertisers have demanded ROI as a direct result of their advertising, so their willingness to pay decreased - from pay-per-view advertising (users must see an ad) to pay-per click (users must click an ad) to pay-per-sale (users must make a purchase).

Eventually, the only sites with sufficient audience to make an appreciable revenue from advertisements were large-scale sites, such as search engines. Ironically, this was seen by the audience as the least desirable place for advertising, which often impeded them from finding the information the desired. This was underscored when the search engine Google made its debut, completely devoid of advertising, and immediately siphoned off the majority of users from other commercial search engines.

Web Portals

During the late 1990's, "stickiness" became a buzzword. Search engines normally received two hits - one on the search page, one on the results page - before users left. If the user could be retained longer, they could show more ads, increasing the likelihood one would appeal; to the user, and the site would earn greater revenue per user.

This was in direct contrast to the purpose of a search engine, which drew an audience of users who wished to find information on other sites, and as quickly as possible. So in retrospect, this concept "was clearly flawed." By attempting to get more clicks per user, the sites drew fewer users overall and got fewer visits per user (as users turned to sites than enabled them to find information more quickly, with fewer distractions), and lost in the balance.

Another attempt, circa 1998, was the "portal" concept, in which search engines attempted to create a single interface that offered users a customized site, containing various kinds of information (news, weather, stock quotes) all in one place. rather than users having to leave for other sites to get it.

One problem is that every site wanted to become a "portal" for users. An ISP wanted its users to access the Web through its portal. Each search engine developed a portal. Each Web site (informational and shopping) attempted to become a portal. The addition of "portal" capabilities increased the complexity of each site, while hiding the information the site was designed to deliver.

Very few portals enjoyed much success - and in these instances, they were largely specialist portals - in effect Web sites that aggregate information on a specific topic for an audience that has a high degree of interest (such that they are willing to create a login account and customize the portal to their desires).

Specialist portals on specific research topics or locations have enjoyed a degree of success, as measured by the number of regular users - but ironically, not in the amount of advertising revenue (which was the justification for the expense of creating a portal). There is some indication with Blogs and RSS feeds, enabling users to easily aggregate information to their own desires, even the few remaining portals may find it diffuclt to remain in buisines.

The Crash

The dot-com crash occurred between March 2000 and October 2002, when the NASDAQ exchange lost nearly 80% of its value, largely as many internet company stocks plummeted and died. A majority of the victims were search engines and portal sites, which were largely doomed well in advance of the crash by their inability to devise or realize a viable profit model.

Even those that survived had lost their appeal, and the amount of venture capital one could obtain for an unproven or experimental process significantly decreased.

Google and the Resurgence of Search

Starved of resources, the existing search engines became less useful, the Web was continuing to expand. Meanwhile, there was no capital to be had to address the markets' need. This paged the way for Google.

Developed as a student project at Stanford University, Google was designed as a test of categorizing and searching algorithms, and soon became wildly successful when made available off-campus. University officials, dismayed at the amount of resources being consumed by the site, insisted it be taken off-campus, and the students who "owned" it were able to obtain venture capital to do so. Ironically, the students had attempted to license their invention to AltaVista, Excite, InfoSeek, and Yahoo before deciding to commercialize it themselves.

In doing so, Google's founders were dedicated to a "purity" of purpose. They felt that the prominence of advertising is obstructive, the conglomeration of extraneous "portal" information intrusive, and struggled to keep Google's home page as simple as possible" a Logo, a text box, and a couple of buttons. It also helped that the core functions - the categorization of content on Web pages and the matching logic of the search functions - were superior.

To monetize their search engine, they began selling advertising, but were dedicated to doing so in the least intrusive possible way. Advertisements were text-only (no graphics, no pop-ups), labeled as such, and a balance was struck between prominence and intrusiveness, so as to balance the desires of advertisers to show their ads against those of search users wishing to find other things.

The Web Navigation Business Today

Presently (2007), Yahoo, Microsoft, Ask.com, and other firms are chasing after Google in an attempt to regain market share from them, with very little success. Meanwhile, Google is selling more advertising than any other Internet firm, and has overtaken many traditional media powerhouses in its revenues.

Google has also implemented a program to extends its advertising to independent publishers: the "Ad sense" program enables other Web site owners to include advertising from Google on their sites, in exchange from a split of the revenue. No indication yet on whether this has delivered on its promise, or that professional Web publishing on the advertising model has made any degree of resurgence as a result.

Google has, to some degree, leveraged some of the DNA of old "portal" sites by developing "gadgets" that display sundry types of information on its home page. But at the same time, it has turned this idea on its head by inviting other Web publishers to create "mash-ups" on their own site of the data that Google aggregates - such that users can republish Google data on their own sites, or prepare it for various devices (mobile, notably) without the need to visit a Google-operated "portal."

Google's largest competitor in this space (Microsoft) has yet to define a successful profit model. In many regards, Microsoft has attempted to follow Google's acquisitions and offer similar services, and has stumbled along the way to earn revenue from doing so. In general, its attempts to generate revenue have resulted in more users flocking to Google for the same services.


Search is an essential feature of any information system, but one entirely neglected in the Web's original design in favor of simplicity, flexibility, and decentralization.

The market's rush to fill that void, with a plethora of different companies attempting a wide variety of solutions, then whittled down to a small handful of firms who got it "right" (or nearly so) is hardly unique to this industry. Such is part of the evolution of virtually every industry.

The author also mentions the "Semantic web" and the way information is being broken into smaller "snippets" of information that can be aggregated in various ways, which begs the question of whether a web "site" such as it exists today will remain the basic unit of information on the Web - or if the medium will evolve into a vast array of data services that users may aggregate on their own. And in the latter case, this will be a new round of innovation and turbulence as the industry makes this adjustment.