IndustryThe Search Engine Report August 4, 1998 – Number 21

The Search Engine Report August 4, 1998 - Number 21

The Search Engine Report Newsletter #21

THE SEARCH ENGINE REPORT
August 4, 1998 – Number 21

By Danny Sullivan
Editor, Search Engine Watch
https://www.searchenginewatch.com/

===================
About The Report
===================

The Search Engine Report is the email companion to Search Engine Watch, https://www.searchenginewatch.com/. It keeps you informed of changes to the site and general search engine news.

The report has 34,000 subscribers. You may pass this newsletter on to others, as long as it is sent in its entirety.

If you enjoy this newsletter, consider showing your support by becoming a subscriber of the Search Engine Watch web site. It doesn’t cost much and provides you with some extra benefits. Details can be found at: https://www.searchenginewatch.com/about/subscribe.html

Please note that long URLs may break into two lines in some mail readers. Cut and paste, should this occur.

===================
In This Issue
===================

+ General Notes
+ Counting Clicks and Looking At Links
+ Promoters Call For Certification
+ AltaVista Releases Search Software
+ Northern Light Adds Search Functions, Freshens Index
+ AltaVista To Buy altavista.com Domain
+ Inktomi: One Database, But Different Results
+ Netscape Smart Browsing Available, Debated
+AltaVista Canada Expands
+ Help For Site Specific Search Needs
+ Search Engine Notes
+ Search Engine Articles
+ Subscribing/Unsubscribing Info

===================
Sponsor Message
===================

When a prospect does a search for a keyword related to your products or services, do you appear in the top 10 or does your competition? Submitting alone does nothing to insure good visibility. WebPosition is the first software product to monitor and analyze your search positions in the top search engines. WebPosition has been compared to similar “services” on the web and has been overwhelmingly voted the best and most accurate tool for search position management.

With WebPosition you’ll know your exact positions for an unlimited number of keywords. You’ll know if you drop in rank. You’ll know when a search engine FINALLY indexes you. You’ll know when you’ve been dropped from an engine. WebPosition is rated 5 out of 5 stars by ZD Net, and includes a 110 page guide to improving your search positions.

Try WebPosition yourself for FREE at:
http://www.webposition.com/cgi-local/index.pl?DS1=e-ds4

===================
General Notes
===================

Hello Everyone-

Despite the best efforts of Windows 98, which crashes my system routinely several times per day, I’ve finished the latest report and updated numerous pages within Search Engine Watch.

In particular, the entire Regional Search Engines area has been restructured. Instead of one page, there are now individual pages devoted to Africa, Asia, Europe, Australia and New Zealand, and the Americas. Regional offerings from the major search engines have been integrated into these pages, while a chart provides an at a glance look at where the major search engines have a presence outside of the United States.

I’ve also updated all the search engine ratings pages with the latest numbers from Media Metrix and RelevantKnowledge. The trend charts show clearly that AltaVista benefited from its more prominent placement on the Netscape Net Search page, while Yahoo weathered its departure with hardly a loss in traffic. NetRatings has also provided statistics for the first time, including an interesting one showing the number of page views per visitor at each of the major services.

Webmasters will find the Search Engines Features chart has been completely revised and expanded. There are several new categories and full information for Northern Light. Likewise, the Search Engines and Frames page has been revised and now also includes a JavaScript solution for restoring the context of a frame, for when someone enters it via a search engine.

I also posted results of the Company Name test earlier this month, which shows how well search engines respond to a search for a company name or web site. Excite was the big winner this time for search engines, while Yahoo came out on top against other directories.

The MetaCrawler Top Search Terms page was also updated earlier this month with June terms, and July ones will be up shortly.

To save room, as this is a long report, you can find links to all the pages via the site’s What’s New page. That link is below.

I’ve also set up a new forum area for Search Engine Watch. There are a variety of places where people discuss search engines secrets (and those who support the site with a paid subscription have access to a comprehensive list of these). So instead, I thought the forum might be a good place for people to post comments about articles in the Search Engine Report, or about search engine issues that aren’t aired elsewhere.

I’ve established a few folders already, specifically for issues touched on in the certification story below, as well as asking for general comments on what features you’d like to see in a search engine. However, feel free to establish your own topics.

To use the forum, just follow the link below. There’s a short, easy registration process to be able to post, and I think you’ll find the forum interface very friendly. Come by and have a say!

Finally, you may notice a few small design changes within the site, mainly the use of reverse color bars to break up sections of pages and color within table borders. Hopefully, these will look OK under various browsers, as my testing has gone OK.

Search Engine Watch Forum

http://trial.internet.com:8000/forums/Ultimate.cgi?action=intro

When you reach the page, scroll down until you see the “Search Engine Watch” section.

===================
Search Engine News
===================

Counting Clicks and Looking at Links

At their core, the major search engines use what I call the location/frequency method of determining relevancy. For example, search for “bill clinton,” and most will return pages primarily ranked by where and how often those words appear in each document.

To be more specific, a page titled “Bill Clinton’s Life” is likely to be considered more relevant than others where the title tag doesn’t mention the US president’s name. That’s an example of how the location of a term can be important. Similarly, a page that repeatedly mentions Bill Clinton probably will get more of a boost than a page with only one reference.

I’m grossly over simplifying the process, of course. Location and frequency are not the only factors used. Each search engine has a blend of techniques that go into their algorithms. But location and frequency have tended to be the dominant factors.

Some new techniques may be about to change that. The idea of leveraging links as a means to improving results is making a comeback. And later this month, one search engine is going to enhance its service with Direct Hit, which taps into user feedback to improve relevancy.

Direct Hit works in the background, quietly watching what users search for, then recording which pages they visit from the “normal” search results. Over time, it develops enough data to know which pages are popular and which aren’t.

To use this information, a user selects the Direct Hit option, which will likely appear above the regular search results. This will bring up Direct Hit’s own list of what’s relevant, where pages are ranked by user popularity.

The system is ideally suited to general queries of one or two words, which are common on the major services. A suggestion to try Direct Hit will probably only appear in response to short queries like these, similar to how the RealNames option appears only for queries of three words or less at AltaVista. The Direct Hit option will also only appear if enough data on a term has been gathered.

Some problems immediately come to mind. Spamming is foremost. Can site owners simply click their pages to the top? The chief defense against this is the sheer amount of data that Direct Hit samples, which makes it hard to skew things. Those attempting to do so are likely to spotted. There are also some other tricks Direct Hit has to help control spamming.

Another problem is the fact that many users don’t search deep. “Only about 7% of users really go beyond the first three pages of results,” says Gary Culliss, Chairman and Founder of Direct Hit. How can the system bring the good stuff to the top if users never dig initially to find it?

“All it takes is one person to find something buried deep in the results list to start its movement upward where it can be viewed by other searchers and boosted further in its ranking,” said Culliss.

In fact, top listed sites that are not visited can move down in the Direct Hit ratings, while sites buried in the results enjoy a significant boost if someone drills down and selects them.

“You can view it in the negative sense of whatever people pass over gets moved down, or in the positive sense of whatever they click on moves up,” Culliss said.

I ran some quick comparisons of Direct Hit against results from the search engine it will soon appear on. All results were at least a little better, and with some queries, they were dramatically improved. A search for “microsoft,” for example, put the company home page, the Internet Explorer page and a software download page at the top.

I’ve no doubt Direct Hit will be popular as a supplement, and not necessarily just on one search engine. The company is talking with other players and positioning its technology as a non-exclusive addition that any of them can use.

While Direct Hit leverages humans directly, Clever leverages them indirectly, via the links they create.

You may have heard of Clever through some scattered press coverage recently given to its core technology, HITS. HITS stands for Hypertext-Induced Topic Search, and it was developed by Cornell University researcher Jon Kleinberg, while he was a visiting scientist at IBM’s Almaden Research Center.

IBM has expanded and enhanced HITS into Clever, a system that ranks pages primarily by measuring links between them.

The process starts by collecting a set of pages relevant for a particular term. For example, Clever might send a query to AltaVista for “bill clinton” and then retrieve the top 200 pages listed. Next, Clever gathers all the pages that the initial 200 link to, plus any pages on the web that link to them.

The result is a set of a few hundred to a few thousand pages, which Clever ranks by counting links. Pages in the set with the most links pointing at them get the best scores, but only initially.

“Links have noise, and it’s not always clear cut which pages are best,” said Kleinberg. “We wondered, was there a way to get some sort of consensus out of the links?”

The solution is to recalculate the scores, this time letting links from important pages carry more weight. To paraphrase Animal Farm, all links are created equal, but some are more equal than others.

Picture it in real life. A link from a page within Yahoo should mean more than a link from someone’s personal page, since the criteria to be listed is much higher. Likewise, links from other “important” sites should carry more weight.

The challenge is helping the algorithm understand what pages are “important.” That’s where the initial ranking comes in. Pages with the most links are established as most important, and during the recalculation, their links transmit more weight.

This produces a completely new set of scores and even allows for situations where a page with only one link to it could do better than another with two links, if that single link is from a very important page.

Repeating this recalculation a number of times further refines scores. Nor does Clever stop there. A series of other tweaks are also made to help improve relevancy.

A key component is to consider text within and near the link. If the actual search term appears, then that link transmits more weight to the page it points at. Clever also discounts the weight of links between pages at the same web site.

The end result of this is a list of top ranked pages. However, IBM doesn’t see Clever being used to provide real-time search results in response to queries. Instead, they feel the value will be to create constantly refreshed lists of relevant pages for categories.

Specifically, imagine the situation at Yahoo, where there are thousands of different subjects. Clever researchers believe their technology could be used to populate these categories with minimal human assistance. Give Clever a few terms relevant to the subject, and it will leverage links to fill the category with best pages on the web.

“You don’t have to have an army of ontologists to stay current,” said Prabhakar Raghavan, a researcher on the Clever project.

So how good are the results? The system isn’t available for testing outside of IBM’s firewall, but in a recent study IBM conducted, Clever’s results were as good or better than Yahoo’s results 81% of the time.

That’s IBM’s research, of course, but I think it’s pretty trustworthy. One need only look at Google to see how effective links can be in improving relevancy.

The last time some students at Stanford University got involved with categorizing the web, it turned into a little site you may have heard of called Yahoo. That alone makes me surprised someone hasn’t yet swooped in to carry off Google developers Larry Page, Sergey Brin and Craig Silverstein into portal heaven. Even more so is the fact that the engine they’ve put together is really good and even has a catchy name.

Google is an experimental search engine that, like Clever, uses weighted link popularity as a primary part of its ranking mechanism. Each page has a rank, based on the number of other pages linking to it and the importance of those pages. Importance, as with Clever, is derived from an overall link count.

Google also makes extensive use of the text within hyperlinks. This text is associated with the pages the link points at, and it makes it possible for Google to find matching pages even when these pages cannot themselves be indexed.

An important difference from Clever is that Google actually crawls the web itself, rather than analyzing a core set of pages from another search engine. Thus, its results should be more comprehensive. Over 25 million pages have been indexed, and the goal is to gear up toward 100 million or more.

Google also provides some ranking boosts on page characteristics. The appearance of terms in bold text, or in header text, or in a large font size is all taken into account. None of these are dominant factors, but they do figure into the overall equation.

So how about the results? I think many people will be pleased, especially for the ever-popular single and two-word queries. A search for “bill clinton” brought the White House site up at number one. A search for “disney” top-ranked disney.com, and sections within it like Disney World, the Disney Channel, and Walt Disney Pictures. Yet interesting alternative sites, such as Werner’s Unofficial Disney Park Links, also made it on the list.

Will Google be going commercial? Page has no opposition to it, but said there’s no particular hurry.

“We’re Ph.D. students, we can do whatever we want,” he said. And what they want is to find the right partners to let them focus on improving relevancy. “I’d like to build a service where the priority is on giving users great results,” Page said.

If you pay a visit, don’t be frightened by the interface. One thing Google needs is a good facelift. Relevancy scores and other extraneous information can obscure the actual listings — but I did say this was an experimental service, right?

The value of links hasn’t been lost on the major search engines, of course. Infoseek chairman Steve Kirsch recently said that link data was a core component of his service’s new retrieval algorithm. And Excite has long made use of link data as part of its ranking mechanism.

“Most in the industry would recognize that the links pointing into a site give you a fair idea of the visibility of the site and its prominence,” said Excite search product manager Kris Carpenter.

It’s likely that there will continue to be a growing emphasis on non-traditional data such as links and user feedback to make sense of the web, as opposed to just the words on the page. It’s essential in a web universe where the text on some pages cannot be trusted, and where other pages cannot be indexed at all.

Direct Hit

http://www.directhit.com/

Clever

http://www.almaden.ibm.com/cs/k53/clever.html

Google

http://google.stanford.edu/

Google WebBase Research Pages

http://google.stanford.edu/google_papers.html

Those interested in how to build a search engine will enjoy this goldmine of information. Be warned — it’s all highly technical.

===================

Promoters Call For Certification

Principals from four major promotion and design firms have sent an open letter to the major search engines calling for the establishment of a certification program for optimization professionals.

The letter is the first such coordinated move from the web promotion community ever regarding search engine positioning issues. Those signing were from Beyond Interactive, MercurySeven, US Web and Web Ignite/AAA Internet Promotions.

The letter arose because of Infoseek’s ban on pages that redirect to other pages. Firms like Web Ignite depend on these type of pages for their optimization programs.

Under these programs, a client specifies the terms they would like to do well for. The optimization company creates targeted pages for the terms and submits them to the major search engines. Visitors finding the pages via a search engine click through and are automatically redirected to the client’s site. They never see, or only see briefly, the target page which is hosted on the optimization company’s site.

The ban on redirection impacts these companies greatly, because they are paid by the click, earning anywhere from $0.25 or more per visitor. Without redirection, they cannot count the visitors they attract in order to be paid.

“That’s the only way we can record our visitors and charge as an optimization company,” said Paul Bruemmer, of Web Ignite, and one of those signing the letter.

Bruemmer believes

Infoseek’s ban emerged because the adult web site industry has heavily abused redirect pages by loading them with spam. In contrast, high profile optimization companies are hesitant to spam, for fear of scaring off important clients.

Infoseek product manager Jennifer Mullin hadn’t yet read the letter, which just went out, but she said redirect pages were banned because of problems with them from both adult and other sites. Infoseek has no plans to change its policy, at the moment, she said.

Of course, some people may consider the entire concept of targeted web pages to be spamming, redirection or not. Search engines were originally designed to find the best pages from those “naturally” occurring on the web. The idea of manufacturing pages can seem too overt.

However, all of the major search engines allow targeted pages at the moment, even Infoseek, as long as the page contains no redirection and presents the users with the same text that the Infoseek spider sees. In general, target pages must also not mislead visitors about the content at a site. Nor can they contain spam, such as word repetition in the body text or meta tags.

In fact, if these manufactured pages were not allowed, it would be impossible for some sites to ever be found. Good examples are sites with graphic intensive pages, those that use frames, or sites that are completely database driven. These sites can be invisible to the major search engines, in their current state, unless targeted pages represent them.

So there’s a good reason for targeted pages, yet their very existence makes many of the common spam rules that have developed hypocritical. After all, what’s the point of trying to keep a page from “inflating” its score when you readily acknowledge it has no “real” score to begin with?

These type of rules made sense when the web was primarily text based. We had a relatively level playing field then, and the idea of penalizing a page that stuffed itself with extra terms made sense.

The situation is much more complex, now. Rules have evolved and gotten more complicated. For example, one of the most common questions I get is how much repetition is too much in a meta tag. Should a bookseller say “kids books, computer books, science books,” for example, or are they repeating “books” too much? And if they don’t use all those terms, do they reduce the chance of being found?

There is no correct answer, because the rules are unpublished and vary by search engine. The result is an incredibly confusing mess for site owners that want to know the basic things they should do to be found. They are forced to rely on guesswork, experimentation, rumor and the few bits of dependable information that search services choose to release.

Given this, it’s easy to say that what’s needed is a set of standard rules that everyone can follow. If we all abide by them, then we’ll be back to the level playing field. Just tell us the rules, and we’ll play fair.

The problem is two-fold. First, the more rules a search engine provides, the more ammunition is provided to those who want to bend the rules or break them altogether. The potential traffic payoff makes this a constant attraction, especially when it is easy to set up shop elsewhere on the web and start again if you find your domain gets banned.

Second, and more important, the idea of giving everyone rules to follow doesn’t result in more relevant pages. It results in top ranked pages from people who are smarter at following the rules. Often, that corresponds with relevant pages. But the distinction is important.

Imagine a certification program with 15 companies all participating. Each of these companies has a client that wants to do well for “auto repair.” Only 10 of them will make the top listings, in most places. That means the remaining five will have upset clients who are not happy with a second page listing. So they’ll go back and rework their pages, albeit within the “rules,” to secure a better placement. As a result, some of the other companies will lose positioning. Thus, they’ll go back and rework their own pages, putting the cycle into a constant loop.

The result is not that the most “relevant” pages are being listed. Instead, the people on top will be those who are cleverest about putting words into a particular order on the crafted target pages they’ll inevitably submit.

In contemplating solutions, we must recognize that all this fury over positioning is concentrated primarily on popular single and two word terms. Remove these from the equation somehow, and suddenly, many spam rules might be unnecessary. That’s because it is not worth the time to spam for longer terms that don’t bring in huge amounts of traffic.

Given this, it may make sense for the major search engines to consider experimenting with the GoTo model and accept paid links, especially for competitive terms.

This may sound like sacrilege, especially to professional researchers, but the reality is that paid links already exist indirectly. Sites listed for top terms are often there because they have paid an optimization company to put them there, or because they have invested significant time (and thus money) to achieve a position on their own.

It’s even arguable that allowing paid links might increase relevancy. For one, the search engines would have more direct control over who’s accepted. At GoTo, only sites truly relevant for terms are allowed to bid on them.

Moreover, search engines now spend significant resources in combating spam. Allowing paid links might greatly defuse the spamming situation, allowing them to concentrate their resources on improving search technology.

Paid links certainly make more sense than trying to certify optimizers to follow rules that are already outdated. Even Bruemmer, who’s leading the certification charge, likes the simpler idea of moving to paid links rather than certification.

“Certification is within a certain paradigm that everyone is under right now. If we could skip that paradigm and go right into a GoTo-type environment, then right on,” he said. “What happened to Open Text is ancient history. An Internet month is like a year. At least 18 years have gone by,” he added, speaking of the fallout from Open Text’s experiment in 1996 with paid placement.

Alternatively, perhaps some new developments such as link analysis or user measurements, as described in the earlier article, will make spam rules needless. Another idea is that new forms of trusted metadata may emerge, provided by a third party. Any of these might provide a real level playing field, and one that cannot be influenced.

What do you think regarding the situation? Could certification or a code of standards make a difference with how things currently operate? What rules would you like? What rules do you think are outdated? Is the idea of paid link repugnant, or do you not care, as long as quality sites are listed? Visit the Search Engine Watch forum below and leave your thoughts. I and others look forward to seeing your comments.

Open Letter to Search Engines

http://www.clientdirect.com/Certification.html

Search Engine Watch Forum

http://trial.internet.com:8000/forums/Ultimate.cgi?action=intro

What Is A Bridge Page?

https://www.searchenginewatch.com/webmasters/bridge.html

Provides more details about different types of targeted pages, and how they are used.

GoTo Sells Positions

The Search Engine Report, March 3, 1998
https://www.searchenginewatch.com/sereport/9803-goto.html

The idea of paid links brings up the specter that small web sites will lose out on visitors, or that searchers will miss information they are looking for. Neither case is necessarily true, and this article about GoTo describes why in more detail.

===================

AltaVista Releases Search Software

AltaVista has released a software tool called AltaVista Discovery that combines web wide and desktop search into one package.

The program runs as a small toolbar that attaches to the browser. Among the features that may appeal to web searchers are Hit Highlighting, Similar Pages, Referring Pages and More Pages.

With Hit Highlighting, Discovery will display a web page found through a search with the query words highlighted, which makes it easier to spot relevant portions.

With Similar Pages, Discovery examines the content of the page being viewed and tries to locate similar ones via AltaVista.

Referring Pages displays a list of all pages linking to a particular page, while More Pages displays a list of all pages from the same site. Both functions can be carried out using AltaVista’s power commands, but Discovery makes them an easy, push button option ideal for novices.

Discovery also has the ability to search for information on a user’s computer. It can scan for matches in a wide-variety of file types on your computer, or within popular email programs.

The program is free and available for download at the link below.

AltaVista Discovery

http://discovery.altavista.digital.com/

===================

Northern Light Adds Search Functions, Freshens Index

Northern Light has added new search functionality, and the service is moving forward in updating its index of the web, which has grown dated in the past months.

Using the new Power Search tab, users can narrow searches in a variety of ways. They can search for terms in the entire document, or just within the title or URL.

Northern Light’s page classification types can also be used to narrow searches. These include Page Type, Language, Sources, and Subject.

Page Type is determined by an algorithm that classifies pages into categories such as “for sale” or “event listings.” Language is determined by a dictionary-based system. Editors classify sites by Subject, into areas such as “arts” and “travel.” Finally, Source is determined primarily by domain-filtering, to place pages into categories such as “personal pages” or “commercial web sites.”

Northern Light can also sort listings by date, the only major search engine to offer this. Date reflects when a page was created or modified, though not all documents will have them, as some web servers fail to report a date. Northern Light also rejects dates if they are clearly inaccurate, which was an unexpected problem.

“We were shocked by how many web documents were dated in the future,” said Northern Light’s Director of Engineering, Marc Krellenstein

Sorting results by date can be of mixed value. After all, older documents are not necessarily less relevant. Also, some documents given minor changes may suddenly appear to be newer. Nevertheless, it’s a nice option to have, and one that many users have requested.

Sorting by date is much more useful when searching within Northern Light’s special collection which has documents from 5,500 publications. These are articles not available on the web, but they can be viewed for a fee (searching for them is free). Many of these are periodicals, and date sorting can help bring the newest articles on a subject to the top.

Northern Light now also supports various field commands, similar to those at AltaVista, Infoseek and HotBot. Use title: before search terms to look for them in the title, such as title:bill clinton. The text: works the same way but finds terms only in the body copy, while url: can be used to restrict searches to a particular site, such as url:netscape.com. The commands can also be combined, such as url:nasa.gov title:pathfinder to find all pages on NASA servers with Pathfinder in the page title.

Also, a special section within Northern Light has been set up in conjunction with Billboard for those interested in music. Searches can easily be restricted to music publications or music web sites, along with some other narrowing options.

Finally, the Northern Light web index had grown quite dated over the past weeks, a situation the service says it is now moving rapidly to correct.

Krellenstein said things fell behind as the service has been busy building up its special collection documents. But the web crawler has been playing catch-up, and it will continue to be kept busy.

“We have no backlog of data. We scaled up the crawler four-fold a few days ago, and in August we will intensify the crawl,” Krellenstein said.

The immediate goal is to freshen the data in the existing index, though new finds will be added. By September, Northern Light hopes to have expanded its size well past 100 million web pages. It currently stands at 80 million.

Northern Light

http://www.nlsearch.com/

Billboard Music Search

http://www.nlsearch.com/billboard/

===================

AltaVista To Buy altavista.com Domain

When AltaVista debuted in December 1995, apparently no one thought it was a problem that another company had the domain name altavista.com. That blunder looks to cost Compaq $3.3 million dollars. It has reportedly worked out a deal to purchase the name.

It certainly would have been cheaper and smarter if someone had thought back then to change the search engine’s name to match an available domain name. But that didn’t happen. AltaVista went up at altavista.digital.com, a subdomain of its then parent company, Digital. Compaq purchased Digital earlier this year.

As you can expect, people mistakenly went to altavista.com by mistake, a web site owned by AtlaVista Technologies. That company, receiving all this unexpected traffic, made good use of it by making the site look similar to AltaVista and selling ads.

This resulted in AltaVista suing AltaVista Technologies, and a judge ruled in March 1997 that AltaVista Technologies had to place a disclaimer on its site. The company kept the domain name, however.

Although expensive, this is probably the best investment AltaVista could make to assist in its portal site efforts. It clears up the ambiguity and thus makes the brand much more valuable.

Wild week for AltaVista suit victor

News.com, July 29, 1998
http://news.com/News/Item/0,4,24739,00.html

Domain Deal Up in Air

Wired, July 28, 1998
http://www.wired.com/news/news/business/story/14058.html

===================

Inktomi: One Database, But Different Results

A number of people have noticed that Yahoo’s Inktomi-powered web results are different from those that appear when the same search is performed at Inktomi-powered HotBot. Likewise, some minor differences also appear at other services where Inktomi provides results.

HotBot was the first Inktomi-powered service, and so that’s the benchmark most people are using to compare the Inktomi results on other services. It pulls its results from the Inktomi database in California.

In contrast, Yahoo is pulling its results from a new database that Inktomi has been building in Virginia, according to Inktomi marketing director Kevin Brown.

These results appear after matches from Yahoo’s own listings, or if a search is performed and the “Web Pages” link at the top of the results page is then selected.

The Virginia database is roughly half the size of the California database, which is why Yahoo’s web searches often yield a smaller number of matches than at HotBot. It was decided to use the Virginia database for Yahoo temporarily, in order to satisfy the huge traffic demands of the popular service without impacting Inktomi’s other partners, Brown said.

The database will be expanded over the coming months until it is a mirror of the California index. Then Inktomi will spread queries from all its partners across both databases, in order to balance the load, Brown said.

Even when this happens, there are still likely to be differences between results at the various Inktomi-powered services. This is because Inktomi gives its partners the ability to make various tweaks and changes to the way results are ranked, a crucial part of marketing the same database to competing companies, Brown said.

A good example is at GoTo.com. Its Inktomi results, which come below paid listings, look identical to HotBot’s with one important exception: only one page per web site makes it into the top results at GoTo.

This is similar to the clustering feature at Infoseek, where matching pages in the top results are grouped together and only the best page is presented. It allows more web sites to have a shot at the top 10 — or the top 40, in GoTo’s case.

At Snap, you may notice that multiword queries bring back more results than at HotBot. This is because Snap performs a broader “match any word” search, while HotBot performs a “match all words” search. Changing HotBot’s settings to “match any” brings the count up to Snap’s level, though top results usually remain the same.

As with Yahoo, results from Inktomi appear after Snap’s own listings. However, you can also query the Inktomi database directly from Snap’s advanced search page.

Overall, it’s best to remember that each service that Inktomi powers is distinct, even though they use the same core database. Brown says that over time, the services may grow even more distinct, as different features are enabled. Inktomi also plans to segment its database, which will produce differences.

For example, a portion of the database might contain pages only from high-quality web sites. One partner may decide to use just this segment, which would be cheaper than querying the entire database. Another partner may decide they want to be comprehensive and so tap into the entire database each time.

As a reminder to webmasters, the best way to ensure you are listed within Inktomi’s database is to submit to HotBot. Just keep in mind that this is not a guarantee you’ll appear in Yahoo or elsewhere, or that your rank at HotBot will be the same as elsewhere.

HotBot

http://www.hotbot.com/

Yahoo

http://www.yahoo.com/

Snap Advanced Search

http://www.snap.com/search/power/form/0,179,home-0,00.html

GoTo.com

http://www.goto.com/

===================

Netscape Smart Browsing Available, Debated

Netscape Communicator 4.5, which has much discussed “smart browsing” features to aid searchers, is now available for download as a beta release.

Shortly after the release, one site owner raised a ruckus when he discovered the change meant those entering “scripting” were no longer taken to his “scripting.com” web site. Instead, they went to a section within the Netscape site.

See the articles below for more details on the dispute. I especially enjoyed the Wired article, where the Electronic Frontier Foundation implies that Netscape is somehow redirecting people without permission.

It’s a rather absurd statement, akin to blaming the phone company for when you dial the wrong number. This is because someone entering “scripting” into a browser is not entering a valid web address. Who’s to say what they wanted to reach?

Yes, the latest browsers have tried to guess at the correct address for those doing this, which means first trying scripting.com, then trying other top level domains. But it could be argued that this isn’t necessarily fair. Why should scripting.edu, scripting.net or even scripting.co.uk be denied a fair shot at the user? What happens if new top level domains such as .web come into being. Should they be secondary to .com?

Netscape can make a very good argument that its changes are even more user friendly than the old system of guessing at domains, because it is likely to better guide users to specific web sites. Whitehouse.org is a perfect example. Without Smart Browsing, enter “whitehouse,” and Netscape guesses at whitehouse.com. I’m sure the porn site that appears is not what most people expect.

True, there are possible legal issues involved, such as what happens when trademarks are entered, and serious ethical ones about how to handle generic terms. But it’s unfair to say, as the EFF did in the Wired article, that Netscape has taken over a user’s preference through Smart Browsing. Users have never set configured Netscape to guess at web addresses by selecting this as preference.

The original functionality of giving Netscape a partial address was for it to fail to locate any site. Then Netscape tried to be more user friendly by guessing at the correct address. Now it is trying again to be more user friendly by guessing at the correct site in a new way.

By the way, if you don’t like Smart Browsing, it’s easy to turn it off. Go to menu, choose Edit, choose Preferences, open the Navigator section and select Smart Browsing. On the panel that appears, down at the bottom, uncheck the box that says “Enable Internet Keywords.”

Netscape Communicator 4.5 Download

http://www.netscape.com/download/prev.html

The Next Net Name Battle

Wired, July 20, 1998
http://www.wired.com/news/news/technology/story/13820.html

Smart browser or dumb idea?

News.com, July 20, 1998
http://news.com/News/Item/0,4,24400,00.html

Smart Browsers Ease Searching

The Search Engine Report, July 1, 1998
https://www.searchenginewatch.com/sereport/9807-browsers.html

Explains how Smart Browsing works in Netscape, and how similar features operate with Microsoft.

Internet Keywords Patent Spat

Wired, July 22, 1998
http://www.wired.com/news/news/technology/story/13892.html

Centraal, which runs the RealNames keyword system, has been sued for patent infringement by competitor Netword. And bad news for Centraal’s system could have an impact on Netscape’s.

===================

AltaVista Canada Expands

AltaVista Canada has expanded its Canadian index to more than 14 million web pages, it was announced July 13.

The service launched in January and operates two indexes: a worldwide one, which is a mirror version of the AltaVista index, and a Canadian index, which is created through a custom crawl of Canadian web sites.

The service has a system that automatically finds Canadian web sites, even if they are hosted under non-Canadian domains, such as .com or .org. AltaVista Canada manager Sandro Berardocco declined to specify how exactly this is done, but he said it is effectively finding Canadian sites that would otherwise be missed if only sites in the .ca Canadian domain were crawled.

In contrast, the recently launched Canada.com service, which is powered by Inktomi, relies on domain filtering in order to produce Canada-specific results. When enabled, only sites from within the .ca domain are listed. In a worldwide search, any Canadian sites are noted with a Canadian flag icon.

There is no way to submit a site to AltaVista Canada. If your site isn’t listed, the service suggests notifying it via email. A long-term solution is to register a .ca domain, which are free from the Canadian government, Berardocco said.

AltaVista Canada is produced locally by Telus Advertising Services, a subsidiary of the Edmonton-based Telus Corporation, Canada’s third largest telecommunications company. The subsidiary also publishes phone directories and provides other informational services, targeted primarily at Alberta businesses.

Canadians and those visiting Canada may also find Alcanseek of interest. It is a directory of Alaska and Canada-related web sites created in May 1997 and recently relaunched.

AltaVista Canada

http://www.altavistacanada.com/

Canada.com

http://www.canada.com/

Alcanseek

http://www.alcanseek.com/

======================

Help For Site Specific Search Needs

I often get questions about what search engine software to use within a web site or on an Intranet. I maintain a page within Search Engine Watch about this, but the topic deserves much more in depth coverage than I can provide.

That’s why I was so pleased to come across the Search Tools web site, which launched recently. At it, you’ll find comprehensive listings of software products, by platform or by name. There is also introductory material to search tools, product news and links to articles, resources and reviews.

If you’re considering site specific search, Search Tools should definitely be on your bookmark list.

Search Tools

http://www.searchtools.com/

================
Search Engine Notes
===================

Excite Edges Out Others In USA Today Poll

Excite came out on top in a recent IntelliQuest survey conducted for USA Today. Survey participants asked to rank Excite, Yahoo and Infoseek on terms of entertainment, content, appeal and ease-of-use. Overall scores were as follows:

Excite: 89 percent
Yahoo: 87 percent
Infoseek: 84 percent

The results were featured in the July 1 issue of USA Today. The survey polled 300 people drawn from the Intelliquest technology panel of 30,000.

===================

SearchUK Adds Explore Feature

SearchUK, a search engine for the United Kingdom, has added a feature that lets users see all the pages from within a particular web site. Called “Explore,” it appears as a small icon next to each listing in the search results. Clicking on the icon causes all the URLs from the particular site to be displayed.

SearchUK

http://www.searchuk.com/

===================

WebPosition Gold Available

WebPosition has announced a new version of its product, WebPosition Gold, which adds optimized page creation, submission and tracking features. The product will be available in full release on August 30, but a beta version is available for purchase and use now.

WebPosition Gold

http://www.webpositiongold.com/

===================

Got HotBot?

HotBot has retained San Francisco-based advertising agency Goodby, Silverstein & Partners to develop a multimillion-dollar brand-building campaign for its service. Among other things, the ad agency is known for its “Got Milk?” campaign for the California Milk Advisory Board. (For those who’ve never seen them, the Got Milk television commercials are hilarious — I actually stop skimming ads in the videotapes mailed to me from home to watch them).

===================

Britannica Launches Internet Guide, Again

The Encyclopedia Britannica relaunched its Internet guide as eBLAST on July 15. The site classifies, rates, and reviews more than 125,000 Web sites. The service was initially launched last year as the Britannica Internet Guide.

eBLAST

http://www.eblast.com/

===================

Excite Buys Home Page Provider

One of the few gaps in Excite’s portal offerings is free home pages. That’s likely to be corrected with its purchase last month of Throw. The company provides tools to build home pages, plus gives users the ability to establish personal chat areas and other online service-like functions.

===================

Lycos Online Available

Lycos has begun offering its own branded Internet access service, Lycos Online, in conjunction with AT&T. The service launched on July 21. This follows on the launch of the AT&T-powered Excite Online last month. Infoseek Online, also powered by AT&T, is expected later this year. Meanwhile, Yahoo Online launched in March, in partnership with MCI.

Lycos Online

http://www.lycos.com/att/member.html

===================

HotBot Add URLs Take Longer

Normally, any page submitted to HotBot via its Add URL page is supposed to appear within 48 hours. However, the promised listing time has been stretched to within two weeks recently. HotBot says this is temporary due to a glitch with Inktomi’s systems, and that the 48 hour turn-around time should return shortly.

===================

LookSmart and Mining Co. Strike Agreement

LookSmart and the Mining Co. have struck a position deal within each other’s sites. At the Mining Co., the web search option now defaults to a branded version of LookSmart. Within LookSmart, relevant Mining Co. guides now appear at the top of listings, branded as “LookSmart Spotlight Sites.”

LookSmart

http://www.looksmart.com/

Mining Co.

http://www.miningco.com/

===================
Search Engine Articles
===================

Web Portals: Home On The Web

PC Magazine, Sept. 1998
http://www.zdnet.com/pcmag/features/webportals/

A review of sites primarily for their portal features, such as free email and home pages, chat, ability to personalize, etc. Excite gets Editor’s Choice, and honorable mentions go to Yahoo and Microsoft Start. For searchers, of interest are the specific “Directory Search” rankings. These evaluate the hand picked selections at each service. Here, Excite and Yahoo get top marks, with Infoseek, Lycos and AltaVista taking second place. However, since AltaVista’s directory is powered by LookSmart, that gives HotBot’s LookSmart powered directory and LookSmart itself second place scores, though they weren’t included in the review.

===================

Adult Sites Are a Snap

Wired, Aug. 4, 1998
http://www.wired.com/news/news/culture/story/14144.html

Snap changes its mind and begins listing adult web sites and taking adult banner advertising.

===================

Home Pages and Portals

FamilyPC, July 21, 1998
http://www.zdnet.com/familypc/content/9806/extras/survey3.html

Why do Netscape and Microsoft even have a shot in the portal wars? Because they have built in start up pages, which many people never change. FamilyPC found 49 percent of its survey respondees still use the browser’s default page. That’s a huge advantage, but it makes even more impressive the fact that people are actively choosing players like Yahoo and Excite despite this. Imagine when they start giving out their own branded browsers.

===================

Portals as Net’s TV stations

News.com, July 21, 1998
http://news.com/News/Item/0,4,24445,00.html

The Garner Group unveils research saying that portals are here to stay, and that sites better consider partnering with them. This was obvious a year ago, so why has it taken them until now to figure it out? After all, Excite launched its “channels” strategy in March 1997, and heavy-duty commerce partnerships have continued on from last year.

===================

Yahoo’s Brand of Cool

Upside, July 20, 1998
http://www.upside.com/texis/mvm/story?id=35ae33d00

Long and interesting article about the development of Yahoo as an Internet mega-brand, especially good for at least acknowledging (briefly) that this has something to do with the quality of its listings. There’s also a sidebar about Excite’s efforts to dethrone Yahoo.

===================

NBC Bolsters Snap

Wired, July 20, 1998
http://www.wired.com/news/news/business/story/13843.html

Details on NBC’s plans to begin promoting Snap on television. Past television ads by other search services have helped increase traffic, at least temporarily.

===================

Is it the browser or is it the portal?

News.com, July 20, 1998
http://news.com/News/Item/0,4,24371,00.html

The content in Netscape’s new site mostly comes from Excite, but Netscape hopes to distinguish itself by establishing deals with other media companies. See also:

Netscape Seeks Top Content Exec

Industry Standard, July 2, 1998
http://www.thestandard.net/articles/news_display/0,1270,982,00.html

===================

Infoseek, Starwave may be match from Fantasyland

San Jose Mercury News, July 19, 1998
http://www.sjmercury.com/columnists/nolan/docs/cn072098.htm

Infoseek is acquiring Starwave as part of the Disney deal, but does Starwave think it will be running things? Details on why some observers believe so.

===================

Going Portal in Europe

NY Times, July 14, 1998
http://www.nytimes.com/library/tech/98/07/cyber/eurobytes/14euro.html

Discusses how old media is behind many of the top portal contenders in Europe, as opposed to new media companies that dominate them in the US. The Jupiter Communications-provided chart will please those who’ve been looking for stats about top search and navigation sites in Europe.

===================

Inktomi searches for Net profits in Europe

BBC, July 10, 1998
http://news.bbc.co.uk/hi/english/business/the_company_file/newsid_128000/128974.stm

Interview with Inktomi chief scientist Eric Brewer, focusing mostly on Inktomi’s search technology.

===================

Can AltaVista stand alone?

ZD Net, July 9, 1998
http://www.zdnet.com/zdnn/stories/zdnn_smgraph_display/0,3441,2118706,00.html

Should Compaq spin-off AltaVista to cash in on portalmania? Some analyst quotes.

===================

One on One with Yahoo

Wired, July 9, 1998
http://www.wired.com/news/news/business/story/13584.html

Short interview with COO Jeff Mallett, on Yahoo’s current high valuation and ability to compete in the face of new and renewed competition.

===================

Yahoo pioneers say work’s the thing

USA Today, July 2, 1998
http://www.usatoday.com/life/cyber/tech/ctd040.htm

They’re worth millions now, but the Yahoos keep working along as normal.

===================

Whatever happened to MSN?

News.com, July 1, 1998
http://www.news.com/News/Item/0,4,23791,00.html

With all the talk about Microsoft Start, what’s the deal with MSN? Looks like it’s to become just an access service, over the long-haul.

===================
End Notes
===================

This newsletter is only sent to those who have requested it. To unsubscribe, use the form at:
https://www.searchenginewatch.com/sereport/unsubscribe.html

To subscribe, see excerpts from past issues or view this entire issue as a web page, visit
the Search Engine Report home page at:
https://www.searchenginewatch.com/sereport/

To change your address, unsubscribe the old one and subscribe the new one, via the links above.

If you need human assistance, send a message to:
[email protected]

Do not reply to this email to send feedback to Danny Sullivan. Instead, send messages to:
[email protected].

This newsletter is Copyright (c) Mecklermedia, 1998

Resources

The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y