Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 07-24-2004   #1
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Logical vs. Physical Domains in Site Architecture and Ranking

In discussing site architecture and internal linking structures, there's often question raised of whether it's best to have all files in the root directory of sites or to construct sites with a hierarchical structure using /subdirectories/

Questions can arise on whether the physical location affects spidering sequences by search engines, and probably more importantly, whether one structure or the other can ultimately affect search engine rankings, particularly in view of the relative importance of having keywords or keyword phrases in the file path or for individual filenames.

I've never seen this particular paper discussed, and while it isn't an SEO-related paper in the strict sense of the word, but is rather about constructing maps, I've read it over many times and wondered whether the concepts presented might be important to take a further, deeper look at.

The major concept is that of differentiating between Physical Domains and Logical Domains.

Quote:
In this paper, we present a technique for automatically constructing multi-granular and topic-focused site maps using trained rules based on Web page URLs, contents, and link structure. In these site maps, the Web site topology is preserved, and document importance, indicated by semantic relevancy and citation, is used for prioritizing the presentation of pages and directories.
Constructing Multi-Granular and Topic-Focused Web Site Maps

And the first of 4 points defining and summarizing some of the major points of the study

Quote:
Identifying ``logical domains'' within a Web site: A logical domain is a group of pages that has a specific semantic relation and a syntactic structure that relates them.
A few of the questions raised could be:

1. How does this affect indented listings with Google, if at all?
2. What about sites that are actually part of a larger site, like Geocities or ISP sites or even Yahoo Stores. Can search engines tell that they're different sites?
3. In view of recent interest in the Hilltop Algorithm and the possible effects of affiliations for scoring, how can hosted sites such as the subdirectory type Yahoo Stores or ISP sites fare, in few of the fact that the actual rightmost unique token is the same for all of them?
4. Do search engines take into consideration for backlinks whether or not a page is the root directory index page, or the index page of a subdirectory?
5. What effect would use of logical domains, if the concept were utilized, have for engines that utilize clustering, as AllTheWeb was doing prior to the Yahoo acquisition?

For those among us who are inclined toward top-down, keyword-oriented site navigation, especially those who have favored the use of subdirectories, it might be worth taking a second look occasionally, particularly when there appear to have been major algo changes.

Quote:
We have developed a set of rules for identifying logical domain entry pages based on the available Web page metadata, such as title, URL string, anchor text, link structures, and popularity indicated by citations.
How important is it whether the entry pages for given keyword searches would be in the root or in a subdirectory? Could a tightly themed subdirectory within a site be considered a "logical domain" on its own, being given relevancy by linking structure within the site or inbound deep-links and be of help for achieving higher rankings than a root level page would get?

Has anyone seen any evidence that there's any weight given to different physical or logical locations within a site's architecture, or whether inbound links have carried different weight depending on where within the structrure of the linking site they're located?

Last edited by Marcia : 07-24-2004 at 12:53 AM.
Marcia is offline   Reply With Quote
Old 07-27-2004   #2
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
As an afterthought, in a recent discussion, posted by seomike here

http://forums.searchenginewatch.com/...read.php?t=653


Quote:
See if you can get some links into it. That will usually prompt a crawl of the root pages.
So is that indicating that crawl frequency is determined by directory depth or placement? Or is it possibly by link structure - or how about number of inbound links to pages or sections, or in Google's case Page Rank? If it is a factor, how does it affect crawls for dynamic sites that don't have a hierarchical structure at all?

Back in the days when Lycos still crawled, I vaguely remember in their "help" section for webmasters mention of so many pages at a time being crawled from different sections on a site. It would be so many from root, then directories, and so on. I think I remember it being 5 at a clip.

On another note, back in November Google was completely fouling up with identifying different Yahoo stores if they were selling the same merchandise. I wrote several days in a row nagging search-quality and showing examples. There were different stores with different owners being shown as indented results - technically in subdirectories of the Yahoo site, with the same physical domain having the same rightmost unique token, but in actuality totally different sites residing on different logical domains.

There were other issues involved, including the fact that some of the stores also had listings under their domain names - and pages were being picked up through links in from Yahoo Shopping. But it was an outstandingly visible example of what can happen with logical domains residing within the same physical domain given certain conditions.

Here we can see a live example at Yahoo. If we search for free web graphics and then click to see

More pages from this site

They're all Prodigy pages and no, they're not all the same site at all. They just happen to be logical domains, unconnected and with no common links, sitting on the same physical domain. Again, the same rightmost unique token.

Issues like this can make a difference when thinking about cross-linking issues and possibly give some food for thought to the tin foil hat crowd regarding whois data for domains being scrutinized. There are some very interesting latent possibilities for enterprising individuals wanting to cover their tracks with aggressively "creative" cross-linking.

It isn't really an academic or theoretical issue at all; there are implications that can have significance for optimizers, and if we look for it, we can see evidence of there being tangible effects, however subtle they may be.

Last edited by Marcia : 07-27-2004 at 02:24 AM.
Marcia is offline   Reply With Quote
Old 01-04-2006   #3
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
<bump>

For those folks who think they've been tossed into the "sandbox" after doing a site redesign/restructuring - is it really the "sandbox," which is reputed to be primarily age-dependent, among other things, if it's an older site? Or could it possibly be related to the difference in the logical domain structure of the site, and possibly a difference in the way canonical issues and site crawls and computations are handled?

There's really nothing new under the sun, but there are some things that are called different things and are applied differently as they're further developed over time.

Might be idle fancy, but it's just some food for thought that might be worth a few minutes time for those who restructured sites that got dropped like a bomb.
Marcia is offline   Reply With Quote
Old 02-04-2006   #4
claus
It is not necessary to change. Survival is not mandatory.
 
Join Date: Dec 2004
Location: Copenhagen, Denmark
Posts: 62
claus will become famous soon enough
I've got a lot of opinions and experience on Information Architecture (that's the thing you're really discussing here) - but I'll just comment on redesigns for now.

Most people redesigning sites actually change more than the looks of the site. They don't just change the stylesheet and images. They change the site structure, content, page structure, and urls/ file paths too.

With that in mind (and remembering that some link juice now flow through "pipes" that are broken) it's no wonder that their ranking change. Expecting otherwise is like turning the steering wheel in a car and expecting it to continue to move in the same direction as before.

Changes like that even happened when sandboxes were something children play in (in my vocabulary that's still all they are)

Last edited by claus : 02-04-2006 at 09:04 PM. Reason: typos
claus is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off