Marcia
11-29-2004, 03:21 AM
BlueFind was mentioned in another thread regarding dynamic sites:
Google cutting the fat from dynamic content? (http://forums.searchenginewatch.com/showthread.php?p=24631)
That got me started looking a bit further. With the directory subdirectory and "add" pages eliminated, many pages are URL only, but what I'm looking at now is this listing in the index
Google Search for site:www.bluefind.com (http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=site%3Awww.bluefind.com+-%2Fdir%2F+-%2Fadd%2F+-search.php%3F&btnG=Search) with a few exclusions
Looking at the cache, there is no cache for the homepage as an absolute URI but there is for www.bluefind.com/index.php
BlueFind Web Directory
The Democracy Of The Web. Unable to connect to database.
www.bluefind.com/index.php - 2k - Supplemental Result - Cached - Similar pages
There have been reports around about problems occuring when there are different ways used of linking to the homepage, with it being recommended that all links be of the same form to avoid confusing the bots.
What it looks like in this case, rather than it being a hijacking issue, as was suggested, is that there was indeed a problem with linking to the homepage with an index.php URL - compounded by a database error, which is a technical issue on the part of the site itself, not necessarily a Google issue at its root but one that really should be looked into further. We can see that the page is OK now, but apparently it wasn't when it was crawled...
http://www.bluefind.com/index.php
...the point being, that maybe there had better be some kind of redirection to avoid these issues, for the time being, anyway.
So much for the homepage issue, which still needs looking into, but how about other pages going URL only on this and other sites? Is it only duplication issues - or can it possibly be something to do with page content and linkage that's triggering something?
As an illustration of what's troubling me, which relates back to what pageoneresults posted in the other thread about empty categories - looking at this page in the directory, which is the main page for Government and Politics:
Government and Politics Page (http://www.bluefind.com/dir/55.php)
There are no links out on the topic itself, so in essence it's an "empty category" as pageoneresults said - but there are other links on the page to sites on different topics altogether. It's indexed and cached properly, but what's the big picture overall with such sites?
I haven't got my head wrapped around what I'm seeing, but could there somehow be some kind of a topical or semantic relevancy issue involved with pages going URL-only on some sites that's otherwise unexplained? I'm admittedly very fuzzy headed about it, but what I'm half-way thinking is: could there be some kind of sitewide semantic analysis and/or block-level analysis at the page level going on with some of the URL only pages we're seeing on some sites?
Google cutting the fat from dynamic content? (http://forums.searchenginewatch.com/showthread.php?p=24631)
That got me started looking a bit further. With the directory subdirectory and "add" pages eliminated, many pages are URL only, but what I'm looking at now is this listing in the index
Google Search for site:www.bluefind.com (http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=site%3Awww.bluefind.com+-%2Fdir%2F+-%2Fadd%2F+-search.php%3F&btnG=Search) with a few exclusions
Looking at the cache, there is no cache for the homepage as an absolute URI but there is for www.bluefind.com/index.php
BlueFind Web Directory
The Democracy Of The Web. Unable to connect to database.
www.bluefind.com/index.php - 2k - Supplemental Result - Cached - Similar pages
There have been reports around about problems occuring when there are different ways used of linking to the homepage, with it being recommended that all links be of the same form to avoid confusing the bots.
What it looks like in this case, rather than it being a hijacking issue, as was suggested, is that there was indeed a problem with linking to the homepage with an index.php URL - compounded by a database error, which is a technical issue on the part of the site itself, not necessarily a Google issue at its root but one that really should be looked into further. We can see that the page is OK now, but apparently it wasn't when it was crawled...
http://www.bluefind.com/index.php
...the point being, that maybe there had better be some kind of redirection to avoid these issues, for the time being, anyway.
So much for the homepage issue, which still needs looking into, but how about other pages going URL only on this and other sites? Is it only duplication issues - or can it possibly be something to do with page content and linkage that's triggering something?
As an illustration of what's troubling me, which relates back to what pageoneresults posted in the other thread about empty categories - looking at this page in the directory, which is the main page for Government and Politics:
Government and Politics Page (http://www.bluefind.com/dir/55.php)
There are no links out on the topic itself, so in essence it's an "empty category" as pageoneresults said - but there are other links on the page to sites on different topics altogether. It's indexed and cached properly, but what's the big picture overall with such sites?
I haven't got my head wrapped around what I'm seeing, but could there somehow be some kind of a topical or semantic relevancy issue involved with pages going URL-only on some sites that's otherwise unexplained? I'm admittedly very fuzzy headed about it, but what I'm half-way thinking is: could there be some kind of sitewide semantic analysis and/or block-level analysis at the page level going on with some of the URL only pages we're seeing on some sites?