PDA

View Full Version : Google cutting fat from dynamic content?


I, Brian
11-24-2004, 12:36 PM
Running a few site: checks recently, and sites that would otherwise show a good few tens of thousands of dynamically generated pages indexed, are now showing on a few thousand indexed pages.

I'm talking about normal forums and directories here, btw.

I'm tempted to think that Google might be trying to trim some of the fat from it's index - ie, if a page doesn't have enough copy, it gets ditched - else there's new improvements to the duplicate content filter, removing pages not considered to have unique enough content.

Would anyone else suggest that I'm seeing a very localised effect - if at all - or is Google trying to trim a lot of what might be considered "fat" from its index?

seobook
11-24-2004, 12:48 PM
I know of a rather new and somewhat empty (because it is so new) directory with close to a million pages in Google's index.

I, Brian
11-24-2004, 02:09 PM
Yes, but I'm thinking that the fat is trimmed once Google's had a good chew on it. So a site will have a large number of pages indexed, then the number dropped down as Google tweaks the required variables to get at the leaner meat.

For example, the Bluefind (http://www.bluefind.com) and AlltheBizz (http://www.allthebizz.com) directories appear to have lost a lot pages - now just a few thousand indexed, rather than tends of thousands. This has happend to one of my hobby forums, too.

I don't see it as a penalty, though - simply an indicator of Google either being more decisive at removing URLs for shared content, or else keeping out pages where there's not enough body copy following a unique page title and header.

I just thought I'd raise it as a general SEO discussion issue in the general SEO community - it seems to be a relatively recent development (if it exists) - though I suspect that those with deeper commercial roots have been aware of this for months at least and have adjusted their strategies already. :)

randfish
11-24-2004, 05:02 PM
Brian,

It would make good sense for them to filter based on this - many large, e-commerce sites have URLs and pages based on the visitors' visit path, so a page could be in 100 or 1000 different machinations and URLs based on how the visitor reached it - it would be wise for IR systems to recognize this and filter...

Good catch and thanks for the tip!

Dave Hawley
11-28-2004, 06:17 AM
I'm not seeing this on the forum I run or some my competitors. In fact, just the opposite.

As Google has nearly doubled it's database of late I tend to think it will constantly be trying add as many pages as it can crawl to its databse.

I not too sure we can rely on a Site: command from Google anymore than we can the Link: command? If that is how the check is being done.

Nick W
11-28-2004, 06:39 AM
There has been speculation over the bluefind issue over at WebmasterWorld and Threadwatch.

DaveN thought it might even be down to hijacking, personally i dont have a clue but it's interesting isnt it?

Mikkel deMib Svendsen
11-28-2004, 06:50 AM
One thing I am pretty sure of is that Google is not cutting fat off dynamic websites. In general, it looks far more to me like they are boosting the numbers not cutting the index. All the sites I am working on, mostly dynamic ones, has seen an increase in indexed pages lately.

Over time many websites, especially dynamic ones, have suffered from sudden de-indexing of some, many or all pages. However, in 99,99% of the cases I've seen it's never been the engines fault but some kind of error on the site. I haven't looked closer into the two examples (sometimes it take quite some time to identify the real problem) but from what I generally see in Google I very much doubt the error is on their side - unless it's a penalty.

Nick W
11-28-2004, 06:56 AM
Good point Mikkel

I see (like we all do i guess) cry's of "google wiped my site!" on a daily basis and it's often about dynamic pages - some of them are followed up a few days later with "Wow! im back!" -- im not sure whether its always the webmasters fault but google playing peekaboo with dynamic pages is not new news :)

Mikkel deMib Svendsen
11-28-2004, 07:52 AM
im not sure whether its always the webmasters fault but google playing peekaboo with dynamic pages is not new news

I have yet to see one single example where Google was to blame (but then again, I haven't seen all). I am sorry to say, but it is usually the webmaster or some "creative" engineer working on the site that does it :)

What I most often find the problem to be is that many webmasters chose the first possible solution they find to get their dynamic website indexed and ranked. Sometimes the chosen solution, allthough it may work, is just not the best for the site.

Some webmasters may jump right into URL-rewrite when in fact the big problems to deal with is session IDs, track IDs, browser agent detection, GEO-location, indentical content, infinite calendars or other spider traps. The URL-rewrite may looks like it solves the indexing poblem but suddenly one of the other problems kick in with the result of de-indexing of pages. Now, most webmasters will think that Google is messing up, don't support URL-rewriting or something like that when in fact the problem is not that at all.

Thats why quick fixes for dynamic websites often only work short time. I recomend all to take the time it takes to really understand every aspect of how your dynamic, or technically advanced, website impact search engine spidering, indexing and ranking. And then, make sure you fix all important problems - not just the first you run into.

Nick W
11-28-2004, 08:13 AM
That sounds more than reasonable to me Mikkel, thanks.

You have any thoughts on bluefind and sevenseek dirs?

Nick

Mikkel deMib Svendsen
11-28-2004, 08:49 AM
You have any thoughts on bluefind and sevenseek dirs?

No, I have not had a chance - too many new client sites to look at, at the moment :) And, even if I did look at them I may not be able to find the real reason unless I wen't though my entire checklist for such sites and that takes days or weeks. I could probably, like others, come up with some possible reasons for the de-indexing but I may not be right

I, Brian
11-28-2004, 09:23 AM
Certainly there are different factors to consider - and certainly different dynamic sites are built in different ways. Simply curious in case there's a real issue here. Not least, because the core directory software that Bluefind and AlltheBizz use is also the one I run on Platinax.

The WMW thread now gives a 404 - has that simply been moved, or deleted??

Nick W
11-28-2004, 09:30 AM
>>wmw thread

Sheesh, looks like it's been removed. Honestly, it's almost more trouble than it's worth linking to threads over there sometimes....

sorry about that..

pageoneresults
11-28-2004, 04:50 PM
You have any thoughts on bluefind and sevenseek dirs?
I've not done any digging, but, I can venture to guess that the pages lost are those that have little to no content (no directory listings). A quick spot check on a few empty cats and those empty pages are no longer in Google.

Whatever Google is doing, it is most likely the right thing. I sure wouldn't want thousands of empty category pages in my index. ;)

P.S. Many of the empty cats that are indexed by Google have the dreaded Supplemental Result tag which is usually a sign of deindexing.

Dave Hawley
11-28-2004, 10:07 PM
Agree, forums in particular will often "Prune" past Threads for a variety of reasons. In general, googlebot seems to have an insatiable appetite for dynamic pages and doesn't seemed too concerned about any "fat".

Marcia
11-29-2004, 01:47 AM
They're probably just not done correctly, but from what I've seen a lot of dynamic sites seem to have problems with duplicated pages. And it has looked like Google has been behaving more selectively where duplications are concerned.

pageoneresults
I've not done any digging, but, I can venture to guess that the pages lost are those that have little to no content (no directory listings). A quick spot check on a few empty cats and those empty pages are no longer in Google.

Whatever Google is doing, it is most likely the right thing. I sure wouldn't want thousands of empty category pages in my index.
Directory pages with no listings come closer to duplications when you look at the percentage of total page content that's replicated, don't they? Especially when you strip away what's part of the global site template - and see what's left then.

Added:
DaveN has posted about the hijacking issue over here in the Google forum:

Come on, Google, Fix it!! (http://forums.searchenginewatch.com/showthread.php?t=2979)

Bluefind isn't all missing; there are lots of pages indexed, though some are listed with URL only - but I didn't see the homepage in the index. We know there's been an issue with missing homepages with a lot of sites.

ThouShaltSeo
12-21-2004, 10:36 AM
I find the opposite. Many sites I have have 3-4 times the real number of pages indexed. if i have 1000 pages, Google shows about 4000 indexed...many from ages ago.

as far as the dupe, it maybe an improvenemt in paper only. It may work better as a math thingie but doing that without first fixing the 302, meta-refresh directs, and minor search engines cacheing your pages and then getting indexed on G is a big mistake.

Now one page shows on G and others ("dupes") show on only with filter=0, but both sites or pages are penalized. In most cases the index page is the penalized page and while you might have everything indexed it's useless because it doesn't show anywhere near the top. I hope Yahoo got the dupe bug, this way at least we know that G will look and solve the problem.


I'm tempted to think that Google might be trying to trim some of the fat from it's index - ie, if a page doesn't have enough copy, it gets ditched - else there's new improvements to the duplicate content filter, removing pages not considered to have unique enough content.