PDA

View Full Version : Google duplicate content, internal linking and topical content issues


Marcia
11-29-2004, 03:21 AM
BlueFind was mentioned in another thread regarding dynamic sites:

Google cutting the fat from dynamic content? (http://forums.searchenginewatch.com/showthread.php?p=24631)

That got me started looking a bit further. With the directory subdirectory and "add" pages eliminated, many pages are URL only, but what I'm looking at now is this listing in the index

Google Search for site:www.bluefind.com (http://www.google.com/search?hl=en&lr=&safe=off&c2coff=1&q=site%3Awww.bluefind.com+-%2Fdir%2F+-%2Fadd%2F+-search.php%3F&btnG=Search) with a few exclusions

Looking at the cache, there is no cache for the homepage as an absolute URI but there is for www.bluefind.com/index.php

BlueFind Web Directory
The Democracy Of The Web. Unable to connect to database.
www.bluefind.com/index.php - 2k - Supplemental Result - Cached - Similar pages
There have been reports around about problems occuring when there are different ways used of linking to the homepage, with it being recommended that all links be of the same form to avoid confusing the bots.

What it looks like in this case, rather than it being a hijacking issue, as was suggested, is that there was indeed a problem with linking to the homepage with an index.php URL - compounded by a database error, which is a technical issue on the part of the site itself, not necessarily a Google issue at its root but one that really should be looked into further. We can see that the page is OK now, but apparently it wasn't when it was crawled...

http://www.bluefind.com/index.php

...the point being, that maybe there had better be some kind of redirection to avoid these issues, for the time being, anyway.

So much for the homepage issue, which still needs looking into, but how about other pages going URL only on this and other sites? Is it only duplication issues - or can it possibly be something to do with page content and linkage that's triggering something?

As an illustration of what's troubling me, which relates back to what pageoneresults posted in the other thread about empty categories - looking at this page in the directory, which is the main page for Government and Politics:

Government and Politics Page (http://www.bluefind.com/dir/55.php)

There are no links out on the topic itself, so in essence it's an "empty category" as pageoneresults said - but there are other links on the page to sites on different topics altogether. It's indexed and cached properly, but what's the big picture overall with such sites?

I haven't got my head wrapped around what I'm seeing, but could there somehow be some kind of a topical or semantic relevancy issue involved with pages going URL-only on some sites that's otherwise unexplained? I'm admittedly very fuzzy headed about it, but what I'm half-way thinking is: could there be some kind of sitewide semantic analysis and/or block-level analysis at the page level going on with some of the URL only pages we're seeing on some sites?

Mel
12-21-2004, 07:23 AM
Thanks for pointing me to this post Marcia. IMO the biggest news in directory reindexing or whatever you want to call it is the Yahoo search in Google, which at the moment returns only 45 out of 283,000,000 million pages (and half of those are Japanese pages) before you have to turn the filter off with &filter=0, when everything goes back to normal.

In addition I am seeing many directories with the pages slipping into the dreaded supplemental pages index. Bluefind seems to have all its pages that I have looked at so far listed as supplemental, and in additition seems to have only 22 pages showing before you have to click the link or add &filter=0 to the end of the search URL to return the pages shown to more or less previous results. But Blue find seems to have the index page indexed in various ways, with and without www, and the index page itself indexed.
Still if you search for Bluefind web directory I do not see it in the first 100 results, which is surprising since that is the page title of the home page and their preferred text for linking.

minstrel
01-03-2005, 12:25 PM
The pages are not all supplemental now, anyway: http://www.google.com/search?q=%22%2Bwww.bluefind.%2Bcom%22&hl=en&lr=&newwindow=1&c2coff=1&safe=off&start=10&sa=N (see http://www.bluefind.com/dir/102.php as an example of one that isn't).

As for "Bluefind web directory" search, today they are at #13 & 14, and again not in the supplemental index:

BlueFind Web Directory
Main | How To Add Url | Suggest Category. BlueFind Web Directory. Arts & Humanities;
Business & Industry; Computers; Government and Politics; Health & Fitness; ...
www.bluefind.com/index.php - 4k - Cached - Similar pages

Mel
01-03-2005, 07:47 PM
Don't know where you are seeing this minstrel, http://www.google.com/search?q=site:bluefind.com&num=100&hl=en&lr=&newwindow=1&c2coff=1&start=100&sa=N&filter=0
shows at least the first 500 pages being in the supplemental index and

http://www.google.com/search?sourceid=mozclient&ie=utf-8&oe=utf-8&q=bluefind+web+directory shows them ranking at #25 for their page title.

Perhaps its the way you are searching?

minstrel
01-03-2005, 10:34 PM
Don't know where you are seeing this minstrel, http://www.google.com/search?q=site:bluefind.com&num=100&hl=en&lr=&newwindow=1&c2coff=1&start=100&sa=N&filter=0
shows at least the first 500 pages being in the supplemental index and

http://www.google.com/search?sourceid=mozclient&ie=utf-8&oe=utf-8&q=bluefind+web+directory shows them ranking at #25 for their page title.

Perhaps its the way you are searching?
Or maybe it is the way you are searching, Mel.

I gave you the link and the quoted text for #13 is a cut and paste from the google.com SERP. And I didn't say that they don't have a lot of pages in the supplemental index -- what I said was that not ALL of the links in the directory are in the supplemental index and then I gave you one example from about page 3 of the SERPs.

Try taking the "filter=0" part off your first query and removing the "mozclient" part from your second.

Mel
01-03-2005, 10:52 PM
Sorry Minstrel you are right - out of 19,600 bluefind pages in Google you managed to find one that was not in the supplemental index.
Good detective work.

Marcia
01-03-2005, 11:40 PM
Going on with the topic in question

Bluefind seems to have all its pages that I have looked at so far listed as supplemental,
All of the ones I have seen are also supplemental, and if a small percentage aren't, that does not indicate a lack of a problem.

There has been an update, and even more pages are now supplemental, which looking at the cache seems in many cases to be a redundancy/duplicate content problem. Categories without any listings have nothing unique about them, so I don't think we can argue about that. Those with only what look like sponsored sitewides are also on pages that are supplemental. Those are also not "unique" and for the most part are not topical for the pages they're on.

And now to boot, some of the sections have gone PR0 altogether.

Mel
01-04-2005, 12:19 AM
....

Since you are also aware of that, why the "filter=0" part of your reply?


The point in using the filter=0 is that it allows you to look at all the pages goolge has indexed since it does not filter any out.

Chris_D
01-04-2005, 03:03 AM
For anyone who is confused about "&filter=0", GoogleGuy explained the purpose of "&filter=0" and 'result crowding' very eloquently in this post:

http://forums.searchenginewatch.com/showthread.php?p=29446#post29446